Upload
vuongkhanh
View
221
Download
0
Embed Size (px)
Citation preview
Master Thesis 2014
Cross-Domain Investigations of User Evaluations
under the Multi-cultural Backgrounds
Submission Date: July 21st 2014
Supervisor: Hayato Yamana
Department of Computer Science and Engineering,
Graduate School of Fundamental Science and Engineering,
Waseda University
Student ID: 5112BG10-8
Jiawen Le
II
Abstract
Twitter, as one of the most popular social network services, is now widely used to query public
opinions. In this research, a large corpus of Twitter data, along with the reviews collected from
review websites, are used to carry out sentimental and culture-based analysis, so as to figure out the
cultural effects on user evaluations. Posts written in more than 30 languages from more than 30
countries are analyzed.
As the first step, the global restaurants are taken as the research subject. By using a range of new
and standard features, a series of classifiers with high performances are trained and applied in the
later steps of sentiment analysis. Then a field expansion has been carried out to confirm that the
same approach can be applied for world attractions. The experimental and analytical results show
that the proposed methods are quite promising and field transferable for cross-lingual sentiment
analysis. Meanwhile, informative conclusions have been drawn that the cultural effects on user
evaluations for both restaurant domain and travel domain actually exist, and are obvious for some
countries and cultural backgrounds.
III
Contents
1. Introduction .......................................................................................................... 1
2. Background ........................................................................................................... 3
2.1 Twitter Data ............................................................................................................... 3
2.2 Review Data ............................................................................................................... 4
2.3 Machine Translation Tools......................................................................................... 5
3. Related Work ........................................................................................................ 7
3.1 Sentiment Analysis ..................................................................................................... 7
3.2 Cross-lingual Analysis ............................................................................................... 8
4. Methodology .......................................................................................................... 9
4.1 Data Collection ........................................................................................................... 9
4.2 Translation and Pre-filtering ...................................................................................... 9
4.3 Location Definition .................................................................................................. 10
4.4 Spam Filtering .......................................................................................................... 10
4.5 Features for Sentiment Classification ...................................................................... 12
4.5.1 Dictionary Construction ................................................................................................. 12
4.5.2 Syntax Features .............................................................................................................. 12
4.5.3 Modified Unigram ......................................................................................................... 12
4.5.4 Review Dataset-based Average Score ........................................................................... 13
4.5.5 Review Dataset-based CCA Score ................................................................................ 14
4.5.6 Window Co-occurrence-based Average Score .............................................................. 14
4.5.7 POS-based Feature ......................................................................................................... 15
5. Experiment .......................................................................................................... 16
5.1 Overview .................................................................................................................. 16
5.2 Preprocessing ........................................................................................................... 16
5.3 Sentiment Classification ........................................................................................... 17
6. Analysis ................................................................................................................ 20
6.1 Statistical Analysis ................................................................................................... 20
6.2 Basic Sentiment Analysis ......................................................................................... 21
6.2.1 Polarity Distribution ...................................................................................................... 21
6.2.2 Sentiment Map ............................................................................................................... 23
6.2.3 Sentiment Keywords Extraction .................................................................................... 25
6.2.4 (Attribute, Value) Pairs Extraction ................................................................................ 28
IV
6.3 Culture-based Analysis ............................................................................................ 32
7. Field Expansion................................................................................................... 38
7.1 Data .......................................................................................................................... 38
7.2 Experiment ............................................................................................................... 38
7.3 Analysis .................................................................................................................... 39
7.3.1 Statistical Analysis ......................................................................................................... 39
7.3.2 Basic Sentiment Analysis .............................................................................................. 41
7.3.2.1 Polarity Distribution…………………………………………………………….41
7.3.2.2 Sentiment Map………………………………………………………………….44
7.3.2.3 Sentiment Keywords Extraction………………………………………………..48
7.3.2.4 (Attribute, Value) Pairs Extraction…………………………………………….. 53
7.3.3 Culture-based Analysis .................................................................................................. 57
7.3.4 Comparison of the Two Domains………………………………………………………58
8. Conclusion ........................................................................................................... 62
Acknowledgements .................................................................................................... 63
References ................................................................................................................... 64
Publications ................................................................................................................ 66
1
1. Introduction
In recent years, social network service (SNS), a newcomer in the field of social media, has drawn
much attention all around the world. Twitter1, one of the most popular social network services,
owns a range of special characteristics, including the tremendous amount of posts, the great varie-
ty of tweet contents, and the rapid speed of information distribution. By posting tweets, people
may discuss on the currently ‘hot’ topics, express their views towards big events, or just talk about
their feelings about some trivial things in their daily life, such as a joyful trip they have made, a
delicious meal, or a satisfactory service they have been offered. Actually, the huge volume of
tweets can be used to survey public opinions. If many users post tweets that contain complimen-
tary words of a restaurant, it is likely that this restaurant enjoys popularity among customers.
Meanwhile, the recent decades also witnessed the remarkable progress of globalization. With
the increase of the number of transnational enterprises and the development of transportation ser-
vices, people from all over the world can use the same product, savor the same meal, and appreci-
ate the same scenery. However, it is quite common that people from different countries may have
totally different feelings about these experiences, probably partly due to their diverse cultural
backgrounds. In order to figure out the cultural effects on the evaluations of people with different
cultural-backgrounds, tweets, as well as some website reviews, can serve as a good dataset to car-
ry out the analysis. By using the natural language processing techniques and the sentiment analy-
sis approaches over this dataset, conjectures about the possible relationship between user opinions
and cultural backgrounds can be made, and further, instructive suggestions can be given out.
However, there exist several challenges considering this issue. After retrieving the useful infor-
mation from the big amount of varied data, the problem of how to correctly figure out the senti-
ment of these short texts remains as the main task for many researchers. As for this task, a noted
work from Liu [1] reviews the existing approaches and research in the field of sentiment analysis.
Based on his research and some other recent research, it can be found that the mainstream ap-
proaches for sentiment analysis include the 2-way method that classifies tweets into positive or
negative [2][3], and the 3-way method that divides the tweets into positive, neutral, and negative
groups [4][5][6].
The language barrier is another challenge. Most previous research in this field only focus on the
English-written tweets, and posts in other languages are simply discarded. Despite the fact that
English-written tweets predominate in the amount, it should also be realized that Twitter enjoys
great popularity in many non-English-speaking countries, such as Japan, Indonesia, and Brazil.
The previous strategy of ignoring these non-English tweets will definitely lead to the biased and
incorrect results in cross-culture analysis. Several works have studied cross-lingual sentiment
analysis, but the target languages and text formats are very limited. OpinionIt [7] is an opinion
mining system comparing the cross-lingual differences in opinions, and in the paper, the authors
take the reviews written in English and Chinese as the main subject. Other related research in-
cludes the work of Bautin et al. [8] and the work of Nakasaki et al. [9], which study blogs or news
in multiple languages.
One more challenge lies in the field transferability. Most research in the field of sentiment anal-
ysis only consider a single domain, and some of the most popular domains for this kind of study
1 http://twitter.com/
2
include the domain of films [6] and political issues [10]. For the reason that texts in different do-
mains may actually have different vocabularies and stylistic features, it is quite questionable
whether a sentiment analysis approach in one field can be applied into another.
Facing all these challenges, this research has made the following contributions:
Considering the multi-cultural background, tweets written in more than 30 languages from
more than 30 countries are taken into consideration in the sentiment analysis;
A domain expansion is carried out that after applying the proposed sentiment analysis ap-
proach in the field of global restaurants, as a further step, tweets about world attractions are
also analyzed using the same approach;
A sequential three-step process of spam classification, subjectivity classification, and polarity
classification is adopted, and in the sentiment classification steps, a series of combinations of
new features are used to train the classifiers to achieve high performances;
By carrying out a range of analyzing methods, an insight into people’s attitudes towards the
target restaurants and attractions is given, and informative conclusions considering the cul-
tural effects are obtained.
As mentioned above, the global restaurants are chosen as the starting point of this research. One
of the firmest grounds for this choice is that the food aspect plays an important role in multi-culture
comparison, and a culture’s perception or standard of food may manifest itself in people’s attitudes
towards the global restaurants. Then, as for the reason why tourism is selected as the expansion
domain, it can be explained by the fact that an independent and remote field is preferred because
the verification of the transferability of the proposed approach is one of the objectives of the re-
search. Among all the possible remote fields, the tourism domain will provide rich source of re-
view data, which is essential and critical to the proposed methodology and the relevant analysis.
The rest of this paper is organized as follows. In the second section, basic background
knowledge is introduced briefly. Then, after listing the related works in the next section, the con-
crete methods and algorithms adopted in this research are discussed with details. Then the process
and steps of the experiment are described and the obtained results are discussed and analyzed,
which lead to the final conclusions and summarization in the last section.
3
2. Background
In this section, basic background knowledge for this research is introduced.
2.1 Twitter Data
Twitter, as the main data source of this research, is a popular online microblogging service that
allows users posting messages within 140 characters. People use Twitter to read the ‘tweets’ sent
by users who they are following, and post their own ‘tweets’ to a group of users who are their fol-
lowers. Because of its enormous influence as a social media, Twitter is widely used as a research
tool to carry out various investigations, which always rely on the assistance of Twitter APIs2.
Twitter provides developers with a range of diverse APIs to meet different needs, and all these
APIs can be generally categorized into 2 groups, REST API and the Streaming APIs. Basically, all
the simple functionalities can be achieved by the first group of APIs, but as for more powerful
real-time applications, users always resort to the Streaming APIs. Particularly in this research,
both search API (belongs to REST API), and public stream API (belongs to the Streaming APIs)
are used to satisfy multiple requirements. To make it clear, a comparison between the utilized APIs
is presented in Table 2.1.
Table 2.1: Twitter API Examples
search/tweets Statues/filter
category REST API the Streaming API
parameters
(part of)
q: query text (required)
geocode: latitude/longitude restriction
lang: language restriction
since_id: returns results with an ID
greater than the specified ID
until: returns tweets generated before
the given date
follow: indicating the users to return
statuses for
track: keywords to track
locations: specifies a set of bounding
boxes to track
(at least one of above three parameters
should be specified)
response
object tweets tweets
HTTP
methods GET GET, POST
resource
URL
https://api.twitter.com/1.1/search
/tweets.json
https://stream.twitter.com/1.1/statuses
/filter.json
rate
limitation
per 15 minutes:
180/user, 450/app limitations for long-lived connections
After the requesting process, both APIs return tweets as the response object, which give the
basic data form of this research. Actually, a tweet object contains such rich contents that it is far
more than what can be read from the common Twitter user interface. Here, several important
fields, which are particularly useful in this research, are picked out and listed in Table 2.2.
2 https://dev.twitter.com/docs/.
4
Table 2.2: Tweet Object (part of)
field description
id unique integer identifier for the Tweet
created_at creation time
text text of the status update
lang machine-detected language for the text
coordinates geographic location of the Tweet
retweet_count number of times the Tweet has been retweeted
place country full name of the country
country_code ISO code of the country
user
name user name
id user identification number
lang the user-set language
location the user-set location
time_zone the user-set time zone
The Twitter dataset in this research contains millions of Tweets, each of which is constructed by
the same fields and their corresponding values. In addition to the actual post text given by the
crucial ‘text’ field, Tweets also provide valuable information such as the language information
and the location information by fields like ‘lang’, ‘coordinates’, ‘country’, and ‘location’.
2.2 Review Data
Another important data source is the review websites. After consumptions or sightseeing tours,
people often leave some comments on the gourmet or tourism sites, so as to express their feelings
and give others suggestions. This kind of comments is always accompanied by corresponding
scores which are also given by the same user.
Figure 2.1: Example of Review Data
5
Figure 2.1 shows a screen shot of a review, with the typical component parts of a comment and
a 5-scale score. In this example review, the user commented at Eiffel Tower with several lines of
text, and scored it as five circles, which means excellent.
In this research, this kind of review data is also collected as an auxiliary dataset in the experi-
ment, which will be explained in details in the later sections.
2.3 Machine Translation Tools
In research related with natural language processing and cross-lingual investigations, machine
translation systems are commonly used to obtain the preprocessing or intermediate results. Exam-
ples of this kind of research include speech recognition and translation system, webpage transla-
tion, and large-scale field-specialized file translation task. Strictly, machine translation is a
sub-field of computational linguistics. But for this research, only the utilization of online machine
translation tools is considered, which serves as a critical step for the cross-lingual analysis.
There are a range of various machine translation tools available for the research or commercial
end. These tools characterize in their language models, supporting platforms, source availability,
etc., and users may choose the appropriate ones according to their objectives. Table 2.3 gives three
examples of the popular machine translation applications and their related information. More
completed and detailed table can be found in the term ‘Comparison of machine translation appli-
cations’ on Wikipedia3.
Table 2.3: Example Machine Translation Tools
name platform license price Source
availability
Translate4
cross-platform
(web application) paid free no
Bing
Translator5
cross-platform
(web application) commercial free no
Babylon6 Windows, Mac paid
Depend on license
(from $34 to $89 for
one license)
no
While choosing a machine translation system, the performance, or the translation accuracy is a
common measuring criteria. As for this aspect, several previous reports may serve as references.
In ‘Comparison of online machine translation tools’7, the researchers ask professional and unpro-
fessional people for their reviews of the above mentioned three tools, and these reviews are re-
garded as the user evaluations. Conclusions have been drawn that Google Translate outperforms
the other two systems in most situations, and the same translator has quite different performances
under different language and text form settings. In ‘An analysis of Google Translate accuracy’8,
the researchers arrive at the conclusion that although Google Translate provides translations
3 http://en.wikipedia.org/wiki/Comparison_of_machine_translation_applications/.
4 https://translate.google.com/. 5 http://www.bing.com/translator/. 6 http://www.babylon.com/. 7 http://82.165.192.89/initial/index.php?id=175.
8 http://www.translationdirectory.com/articles/article2320.php/.
6
among a wide range of languages, the accuracies vary greatly. Results suggest that translations
between European languages are quite good, but the performance becomes relatively poor while
involving Asian languages. Above all, although its performance is never likely to reach the level
of an expert human’s, it can provide quick, cheap translations even for unusual language pairs.
Based on these evaluations and conclusions, Google Translate is chosen as the tool for the
translation tasks in this research.
Finally, here summarizes the advantages and disadvantages of using a machine translation tool.
Merits:
a) A wide range of existing applications to select, with coverage of all platforms ;
b) APIs provided by some popular online systems for an easy access;
c) Relative high performances for certain languages and certain forms of texts.
Demerits:
a) Restrict rate limitation, or only for commercial use;
b) Not capable of all kinds of languages and all kinds of text forms;
c) Usually cannot be specialized for certain domains or fields.
7
3. Related Work
3.1 Sentiment Analysis
The sentiment analysis of Twitter data has been focused by many researchers in recent years, and
there have been a range of significant works that make contributions to this field.
In the aspect of opinion mining, a noted work is presented by Pang and Lee [11], which gives a
broad view of some existing approaches for sentiment analysis and opinion retrieval. Early re-
search that tries to put forward new methods or improve existing approaches considering the par-
ticular study subject of tweets can be listed as followed.
Go et al. [12] use the emoticons to query Twitter, and take this data as the training set. Then
they divided these tweets into negative ones and positive ones according to the sentiment of the
query emoticons. As for the applied models, they build Naive Bayes, MaxEnt and Support Vector
Machines (SVM), and report that SVM model outperforms other models. They also obtain the
result that unigram feature model achieves the best performance, which cannot be gained by using
bigrams and parts-of-speech feature models.
The work of Pak and Paroubek [4] characterizes in the method of the collection of objective
training data. The source of this objective data includes several popular newspapers, whose sen-
tences are usually considered as without special sentiment polarity. In contrast with the conclusion
of Go et al., Pak and Paroubek report that n-gram and POS strategies both make contributions to
the performance.
On the other side, the research of Barbosa and Feng [13] mainly focus on the syntax features
such as hashtags, URL links, and exclamations, and then make a combination with the POS mod-
el.
In the work of Kouloumpis et al. [14], the authors explore the utility of linguistic features for
detecting the sentiment of tweets. They use a supervised approach to evaluate the usefulness of
existing lexical resources and the informal and creative features in Twitter. Their experimental
results show that POS features are not as useful as features from an existing sentiment lexicon
while they are conjunct with the microblogging features.
Agarwal et al. [2] examine sentiment analysis on Twitter by tree kernel and feature based mod-
els. They figure out that both these models outperform the unigram baseline, and the most im-
portant features are combinations of prior polarity of words and their POS tags. They also tenta-
tively conclude that sentiment analysis for Twitter data is not that different from sentiment analy-
sis for other genres.
In paper presented by Saif et al. [15], the authors introduce a novel approach of adding seman-
tics as additional features into the training set for sentiment analysis. They compared the semantic
features with the unigrams and the POS sequence features as well as the sentiment-topic features,
and find that the semantic feature model outperforms the unigram and POS baseline for identify-
ing negative and positive sentiment.
Inspired by the social sciences findings, the work of Hu et al. [16] investigates the utility of so-
cial relations in sentiment analysis. The authors proposed a high performance framework for han-
dling noisy and short tweets, and obtain the conclusion that the user-centric social relations are
quite useful for sentiment classification of Twitter data.
All the above-mentioned works only take the common English tweets into consideration, and
8
have not touched upon the cross cultural backgrounds.
3.2 Cross-lingual Analysis
As for the field of cross-lingual sentiment analysis, the noted opinion analysis system Oasys,
which is proposed by Cesarano et al. [17], allows the user to observe the change of intensity of
opinion over countries and news sources.
In the work of Abbasi et al. [18], sentiment analysis methodologies are proposed for classifica-
tion of web forum opinions in English and Arabic. A range of stylistic and syntactic features are
evaluated, and the proposed sentiment classification methods are proved to be useful for the doc-
ument level sentiment analysis.
Guo et al. [7] focuses on extracting customers opinions from the reviews and predicting their
sentiment orientation in multiple languages. They present an aspect-oriented opinion mining
method with Cross-lingual Latent Semantic Association model, and by applying this model, they
report that the proposed method achieves better performance compared with the existing ap-
proaches.
The work of Cui et al. [19] uses emotion tokens to solve the problem of cross-lingual sentiment
analysis. The authors hold that emotion tokens are commonly used in Twitter, and they directly
express one’s emotion regardless of his language. They compared their approach with semantic
lexicon based approach and some web services on the sentiment analysis task of multi-lingual
tweets, and prove the effectiveness of the proposed algorithms.
Gao et al. [20] research on Twitter and the Chinese version of Twitter---Sina Weibo, and make a
series of simple statistical comparisons in several different aspects, such as the characteristics of
user behaviors and the content of messages.
Compared to all these works, this research focuses on the analysis of cross-lingual user evalua-
tions, which is based on the sentiment classification using the dataset of Twitter and reviews on
the Web. More than 30 languages and more than 30 countries are taken into account, so as to ob-
tain more authentic and comprehensive results for culture-based analysis. The approaches are then
expanded to the travel field, which testifies the transferability of the proposed cross-cultural sen-
timent analysis methods.
9
4. Methodology
In this section, all the approaches and algorithms are carried out on the datasets of restaurant domain,
and are also applied in the later step of field expansion.
As for the new points that are proposed in this section, they can be briefly concluded as follows:
In the location definition step, both manually constructed dictionary and existing API are
used, so as to define a larger proportion of tweets despite the noisy location data in the origi-
nal tweets;
In spam filtering step, a series of traditional and new features are applied to train spam classi-
fiers to gain high performances;
In the feature selection step of the sentiment classifications, in addition to the common-used
syntax feature, the traditional unigram method is modified to overcome the word sparsity,
and the review data is fully utilized to create novel features. Also, classical statistical method
is adapted to fit this task, and popular NLP tools are harnessed to produce potential feature
groups.
4.1 Data Collection
The data used in this research mainly comes from two sources, Twitter and restaurant review web-
sites.
First, as for the Twitter data collection, 9,523,211 restaurant-related tweets were gathered in 4
months (from Sep. 2013 to Dec. 2013), by using Twitter Streaming API and Search API, which
have been introduced in Section 2.1. All the data has been restricted by the names of target restau-
rants (i.e. McDonald’s, KFC, Burger King, Pizza Hut, Subway, and Starbucks), which has been
translated into multi-languages.
Then, as an auxiliary dataset, the review dataset was constructed by collecting the English-written
reviews from some popular review websites91011. The reviews include the text comment parts and
their corresponding scores, as explained in Section 2.2. Totally 55,031 reviews were collected in
this step.
4.2 Translation and Pre-filtering
In this research, 34 languages (i.e. en, es, id, ja, fr, pt, tl, ru, tr, zh, ar, th, et, nl, it, de, ko, bg, sv, pl, vi,
sk, da, ht, lt, lv, sl, fi, is, no, fa, hu, el, uk12) are taken as the target languages. The selection of the
target languages is based on tweet amounts, language populations, and whether can be translated by
machine translation tools. As mentioned in Section 2.1, the original Twitter data gives the ma-
chine-detected language for the text by the field ‘lang’, and this information is simply used as the
recognized language for the tweet in this research. Here, tweets that cannot be correctly translated
into English are discarded.
The remaining data is then filtered by the pre-defined condition of being related with restaurants,
which is restricted by a list of restaurant related words that have the highest frequencies in the
9 http://www.tripadvisor.com/. 10 http://www.yelp.com/. 11 http://www.zagat.com/. 12 http://en.wikipedia.org/wiki/ISO_639-1/.
10
review dataset. Here, 45 most frequent words are selected as the restaurant related words to filter the
original Twitter data.
4.3 Location Definition
A location name dictionary (with 1836 entries) is manually constructed by referring to online sta-
tistics13
, and is used to query the names of counties or cities appeared in the location-related items
of tweets (e.g. the ‘location’ item in user profile). Then Yahoo yql API14 is used to parse these items
of the remained undefined tweets again to obtain more definitions. After these two steps, the ratio
of the tweets that have been labeled with location names is 72.8%. This part of tweet data is further
used in the later steps.
To make clear of the effectiveness of the proposed method of location definition, Table 4.1
shows the comparison among the definition ratios by using the direct information given by tweet
fields and by using the location name dictionary and yql API. In the table, ‘coordinates’ in the
reference column means using the direct information in the ‘coordinates’ field in a tweet, ‘coordi-
nates’ + ‘place’ means using the direct information in both the ‘coordinates’ field and the ‘place’
field in a tweet. Similarly, by using information in ‘coordinates’ and ‘place’ fields, as well as the
manually constructed dictionary and yql API, a definition ratio of 72.8% is obtained, which is
presented in the last row of the table.
Table 4.1: Location Definition Ratio Comparison
reference definition ratio
‘coordinates’ 7.0%
‘place’ 6.1%
‘coordinates’ + ‘place’ 7.3%
‘coordinates’ + ‘place’ + location dictionary 51.4%
‘coordinates’ + ‘place’ + location dictionary + yql API 72.8%
As can be inferred from the above table, if only the direct information in certain location related
fields are utilized to define the locations, most (approximately 92.7%) of the Twitter data should
be discarded in this very early stage, which will apparently lower the efficiency of the experiment.
By referring to the manually constructed location dictionary, 44% more tweets became useful, and
by additionally collaborating with the yql API tool, tweets that can be used in later steps further
increased.
4.4 Spam Filtering
Strictly, whether a tweet is spam or not in this research should depend on whether the content of the
tweet text contains some useful information to indicate the subjective opinions towards the res-
taurants. In this research, however, a simple spam filtering technique is applied. Firstly, adver-
tisements and pure ‘check-in’ tweets are regarded as ‘spam’. In addition, tweets posted in a certain
short time period which have exactly the same contents are also considered as ‘spam’.
A Bayesian classifier is used here, because Bayesian classification is usually robust to the noisy
13 http://en.wikipedia.org/wiki/Lists_of_cities_by_country 14 http://developer.yahoo.com/yql/.
11
information. The training features include the number of the followers and friends of the user, the
ratio of the number of followers and friends, the date of the registration, average number of new
friends and followers per day, the latest 20 posted tweets, and also some syntax characteristics such
as the at marks, hash tags, and URL links. Among the tweets whose location cannot be defined in
the last step, 1200 tweets of them are randomly selected as the training set, and the later steps of
training and cross-validation are implemented over this set. Each tweet in these two sets is judged
by three persons, and a majority vote is applied to decide whether the tweet belongs to ‘spam’ or
not. Finally, the performance of the ‘spam’ classifier turns out to be of an accuracy of 97.8% by
using all the proposed features. This trained classifier is then applied to the whole dataset to filter
out the ‘spam’ tweets.
Again, as a proof of the effectiveness of the proposed features in spam classification, the accu-
racies of the classifiers trained by different groups of features are demonstrated and compared in
Table 4.2. Some of the feature groups are followed by the bracketed enumerations of the concrete
features that are actually used.
Table 4.2: Spam Classifier Performance Comparison
features accuracy
syntax features (‘#’, ‘@’, URL, RT) 89.3%
syntax features
+ friend count, follower count 92.5%
syntax features
+ friend count, follower count
+ considering the time period (new friends/day, new follow-
ers/day, tweets/day)
94.4%
syntax features
+ friend count, follower count
+ considering the time period
+ considering the recent 20 tweets (‘#’, ‘@’, URL, RT counts
in the 20 tweets, repeating tweets in the 20 tweets )
97.8%
In addition to the traditional syntax features and friend/follower count features, which are
commonly used in the spam classification tasks, the time-based counts and the recent tweet stream
are deemed to be important and helpful to identify ‘spam’ tweets. As presented in the above table,
by cooperatively considering the time period and the latest 20 tweets, the accuracy of the spam
classifier has increased to some extent.
Besides, compared to the previous study15, the 9.8% proportion of ‘spam’ tweets in this research
is relatively high. This may be explained by the specialized definition of a ‘spam’ tweet, and the
special focus on the restaurant field. In this step, these 9.8% ‘spam’ tweets are filtered out from the
dataset.
15 http://www.pearanalytics.com/wp-content/uploads/2012/12/Twitter-Study-August-2009.pdf/.
12
4.5 Features for Sentiment Classification
4.5.1 Dictionary Construction
Before the features selection step, two dictionaries are constructed beforehand.
First, a total word dictionary records all the words appeared in the total Twitter dataset for more
than 3 times, with their occurrence frequencies. The size of this dictionary (tw_total_dict) turns out
to be 58,615 entries.
Then, an initiative polarity dictionary (pol_dict_ini) is constructed by combining the entries of
several popular authoritative polarity dictionaries on the Internet (Table 4.3). The entries in
pol_dict_ini totally amount to 125,277.
Table 4.3: The Structure of the Initial Polarity Dictionary
Label Source
Positive
Positive Score > 0.75, or Positive Score – Negative
Score > 0.5 (SentiWordNet16
),
Strong Positive (MPQA17
),
Positiv category (the General Inquirer18
)
Negative
Negative Score > 0.75, or Negative Score – Posi-
tive Score > 0.5 (SentiWordNet),
Strong Negative (MPQA),
Negativ category (the General Inquirer)
Neutral Positive Score = 0 and Negative Score = 0 (Senti-
WordNet)
4.5.2 Syntax Features
The special syntax characteristics of tweets result in inconvenience while preprocessing the tweet
texts, but on the other hand, they are quite informative in the task of sentiment analysis.
In this research, totally 10 syntax characteristics (i.e. ‘!’, ‘?’, ‘#’, ‘@’, ‘RT’, upper-case words,
capitalized words, URL links, emoticons, and slang words) are taken into consideration. All these
characteristics are counted by their occurrences in one tweet, and this 10-dimension vector is re-
garded as the ‘syn’ feature. Here, a manually built emoticon dictionary (with 300 entries) and slang
dictionary (with 200 entries) are referred to during the counting process.
4.5.3 Modified Unigram
Compared to the standard unigram model, an additional dimension-reduction is applied while
processing the modified unigram features, so as to alleviate the influence of word sparsity.
First, for each word in tw_total_dict, set the polarity score as 2, -2, and 0 if it is labeled as Positive,
Negative, and Neutral in pol_dict_ini respectively. Then parse all the tweets to calculate out the PMI
(Pointwise Mutual Information) values of all the pairs of words in tw_total_dict. The PMI value of
16 http://sentiwordnet.isti.cnr.it/. 17 http://mpqa.cs.pitt.edu/. 18 http://www.wjh.harvard.edu/~inquirer/.
13
word 𝑤1 and 𝑤2 is given by
𝑃𝑀𝐼(𝑤1 , 𝑤2) = log𝑝(𝑤1 , 𝑤2)
𝑝(𝑤1) ∙ 𝑝(𝑤2)
where, 𝑝(𝑤1, 𝑤2) is the co-occurrence probability of word 𝑤1and 𝑤2 in one tweet, and 𝑝(𝑤1)
and 𝑝(𝑤2) are the occurrence probabilities of word 𝑤1and 𝑤2 in one tweet respectively.
Then, for each word NOT appears in pol_dict_ini, sort its PMI values with the words in
pol_dict_ini, and carry out majority voting among the top 10 sorted items. The ‘positive inclined’
word is then score as 1, the ‘negative inclined’ word is then scored as -1, and other words (i.e. the top
10 corresponding words are all from the Neutral category in pol_dict_ini) is then scored as 0. The
output of this step is a new polarity dictionary (pol_dict) with the vocabulary of total_dict, and each
word in it is mapped to a score of 5 scales (i.e. 2, 1, 0, -1, and 2).The comparison of the word counts
for each scale before and after this step is showed in Table 4.4.
Table 4.4: The Comparison of Polarity Word Counts Between pol_dict_ini and pol_dict
score pol_dict_ini pol_dict
2 17,494 17,494
1 0 7,108
0 19,581 24,332
-1 0 4,957
-2 4,737 4,737
Total 58,615 58,615
Based on pol_dict, each tweet can be projected to a 5-dimension vector, and each dimension
records the count of the unigram words in this category. This vector is named as the ‘5s’ feature.
Specifically, to testify the effectiveness of the proposed modified unigram feature, in compari-
son with the standard unigram feature, the performance of the classifier that is trained by only this
one feature (i.e. standard unigram feature, or modified unigram feature) is calculated out. The
training and test set consists of 1000 manually labeled tweets, and for each situation, SVM (Linear,
RBF, and Polynomial) methods and the Naïve Bayes (Gaussian, Multinomial, and Bernoulli)
methods are applied, and the highest accuracy is recorded in the table below (Table 4.5). As for the
more detailed training and test settings, please refer to Section 5.3.
Table 4.5: Standard Unigram Feature and Modified Unigram Feature Effectiveness Comparison
feature subjectivity classification accuracy polarity classification accuracy
Standard Unigram 44.8% 66.7%
Modified Unigram 65.6% 76.5%
From the above table, it can be figured out that by using the modified unigram feature, the ac-
curacy of the classifier has been increased by a wide margin, both in the subjectivity classification
and the polarity classification case, thus proving the effectiveness of the proposed ‘5s’ feature.
4.5.4 Review Dataset-based Average Score
While users always express their opinions in their tweets, they also give out the clear evaluation
towards the products and services on some special review websites. These reviews are much more
detailed in the user experience, and are always with a corresponding concrete score, such as the
14
most common 5 scale score. This kind of information can be quite useful if taken full advantage of.
In the previously constructed review dataset, each entry has a tuple structure of (𝑡𝑒𝑥𝑡, 𝑠𝑐𝑜𝑟𝑒). In
this step, all the text parts are first processed as a BoW model, and the total vocabulary of the review
dataset is described as 𝑊𝑟𝑣. For each word 𝑤𝑖 in 𝑊𝑟𝑣, the review dataset-based polarity score is
calculated by
𝑝𝑜𝑙𝑤𝑖=
∑ 𝑠𝑐𝑜𝑟𝑒𝑗𝑡𝑒𝑥𝑡𝑗∈𝑇𝑋𝑤𝑖
|𝑇𝑋𝑤𝑖|
where, 𝑇𝑋𝑤𝑖 is the set of review texts, in which the word 𝑤𝑖 occurs, 𝑡𝑒𝑥𝑡𝑗 is a review text in
𝑇𝑋𝑤𝑖, and 𝑠𝑐𝑜𝑟𝑒𝑗 is the corresponding score of 𝑡𝑒𝑥𝑡𝑗.
Then for each tweet itw in the Twitter dataset, the review dataset-based average score is given
by
𝑎𝑣𝑔𝑡𝑤𝑖=
∑ 𝑝𝑜𝑙𝑤𝑗𝑤𝑗∈𝑊𝑡𝑤𝑖
|𝑊𝑡𝑤𝑖|
where, 𝑊𝑡𝑤𝑖 is the word set of 𝑡𝑤𝑖, and 𝑝𝑜𝑙𝑤𝑗
is the polarity score of 𝑤𝑗 given by the last step.
In these two calculation steps, length normalization is applied that the occurrence number of the
word that has the highest frequency in a review or in a tweet is normalized into 1. The float average
score calculated by the second formula is named as the ‘rv’ feature.
4.5.5 Review Dataset-based CCA Score
Canonical correlation analysis (CCA) is a classical statistics method to figure out the latent relations
among multiple variables. In some previous work, CCA has already been used in many fields, such
as image retrieval [21], data clustering [22], and opinion mining [23].
In this case, each entry in the review dataset consists of a comment text and a 5-scale score, which
is described by the format (𝑡𝑒𝑥𝑡, 𝑠𝑐𝑜𝑟𝑒). For the reason that there must be some consistency in the
comment text and score from the same person, it can be safely concluded that there is some latent
relationship between them. Thus, the CCA method can be used here to get the latent relationship
between the users’ sentiment and polarity words. Here, the first correlated variable is adopted as the
measure criterion. The review dataset is taken as the condition set, and the first correlated variable
parameters are decided by the CCA process. Then, for each tweet in the Twitter dataset, the first
correlated variable is calculated and this float number is given to each tweet as the ’cca’ feature.
4.5.6 Window Co-occurrence-based Average Score
Since that the neighboring relationship among words may contain indicative information for sen-
timent analysis, the score based on the co-occurrence in a three-word window is calculated out in
this section.
Inspired by the previous research [24], in which a propagation algorithm is applied to analyze the
sentiment of online reviews, a modified graph-based propagation algorithm is also adopted here to
obtain the polarity score of each word in tw_total_dict based on the three-word window neighboring
relationship.
First, a co-occurrence dictionary is constructed by parsing all the tweets in the Twitter dataset.
The key of the item in this dictionary is the word pair 𝑤𝑖_𝑤𝑗, and the value of the item in this dic-
tionary is the times 𝑡(𝑤𝑖 , 𝑤𝑗) these two words appeared in the three-word window.
Then, as an initial propagation graph, all the words in tw_total_dict are taken as the nodes of the
15
graph. The value of each node is initiated as 1, and -1 for the words in the Positive category and
Negative category of pol_dict_ini respectively. For other words, the initiated node value is set as 0.
Then, for each iteration, the value of each node is updated by
𝑣′𝑛𝑖
= (1 − 𝛼) ∙∑ 𝑣𝑛𝑗
∙ (1 + log (𝑡(𝑛𝑖, 𝑛𝑗)))𝑛𝑗∈𝑁𝐸𝐼𝑛𝑖
∑ (1 + log (𝑡(𝑛𝑖 , 𝑛𝑗)))𝑛𝑗∈𝑁𝐸𝐼𝑛𝑖
+α ∙ 𝑣𝑛𝑖
where, 𝑁𝐸𝐼𝑛𝑖 is the set of the nodes neighbored with node 𝑛𝑖, and 𝑡(𝑛𝑖 , 𝑛𝑗) is the co-occurrence
times of the words of node 𝑛𝑖 and 𝑛𝑗, according to the previous built co-occurrence dictionary. α
is a tuning parameter, which is set as 0.6 in this step. In the final graph where it converges, each node
has a float value indicating the polarity of the word of this node. A polarity dictionary can be ob-
tained by this final graph, and the average score of each tweet can be calculated based on the newly
constructed polarity dictionary. This float score for each tweet is named as the ‘win3’ feature.
4.5.7 POS-based Feature
The POS (part-of-speech) information is usually used in the NLP analysis, and some part-of-speech
pairs are especially sentiment expressive. Here, all the tweets are first processed by the Stanford
Parser19 to get the dependencies trees. Then 10 most common and sentiment expressive POS pairs
(i.e. ‘acomp’, ‘advmod’, ‘amod’, ‘conj’, ’dobj’, ‘neg’, ’ nsubj’, ‘purpcl’, ‘rcmod’, and ‘xcomp’.) are
chosen manually, and the sentiment expressed in these pairs are decided according to some manu-
ally constructed rules (e.g. the sentiment expressed in a ‘neg’ pair is the opposite of the sentiment of
the polarity word in the pair ). For each tweet in the Twitter dataset, each above-mentioned POS
pair that appears in the tweet is given with a polarity label. Then, to decide the polarity of the
tweet, a simple majority voting method is applied, which means that the polarity label that has the
biggest POS pair count passes its polarity to the tweet. This feature is called ‘pos’ in the later
analysis steps.
19
http://nlp.stanford.edu/software/lex-parser.shtml/.
16
5. Experiment
In this section, the experiment is carried out over the restaurant domain, but the basic steps and
training methods are also applied in the field expansion section.
5.1 Overview
The main steps of the whole experiment are described in the flow chart shown in Figure 5.1. After
collecting the Twitter and review data, the location definition step is carried out, and two diction-
aries are manually constructed based on the datasets and the online dictionaries explained previ-
ously. Then three main classifiers are trained and used to classify and the tweets. Finally, based on
the classification results and the data analysis results, the cultural effect on evaluations is clarified.
Figure 5.1: The Main Flow of the Experiment
5.2 Preprocessing
As for the Twitter dataset, the preprocessing basically contains 12 steps: a) ‘RT’ and URL links
deletion, b) Emoticons conversion, c) Lower-casing, d) HTML transcoding, e) Hashtags conversion,
f) Punctuation deletion, g) Word segmentation, h) Non-alphabet words and single alphabet words
deletion, i) Stop words discard, j) Repeated alphabets reduction, k) Chat words conversion, l)
Lemmatization. However, the processing is also task-specific in some steps. For example, as input
for the Stanford Parser, only a) to e) are carried out.
As for the review dataset, the preprocessing is much simpler that it only contains 6 steps: c)
Lower-casing, g) Word segmentation, h) Non-alphabet words and single alphabet words deletion, i)
Stop words discard, k) Chat words conversion, l) Lemmatization.
Here, ‘RT’, URL links, hashtags, and repeated alphabets are recognized and processed by regu-
lar expressions. The emoticon conversion step, the stop words discard step, and the chat words
conversion step are based on manually constructed emoticon dictionary (with 300 entries), stop
word list (with 145 terms), and chat word dictionary (with 150 entries) respectively. The low-
er-casing step, the HTML transcoding step, the word segmentation step, and the lemmatization
17
step are implemented by the NLTK20
tools.
5.3 Sentiment Classification
In this research, sentiment classification is divided by two steps. The first step, subjectivity classi-
fication, is to classify the spam filtered dataset into the subjective dataset and the objective dataset.
The second step, polarity classification, is to further classify the subjectivity dataset into the positive
dataset and the negative dataset. In each of these two steps, a pre-trained classifier is applied to carry
out the classification. The training process of these two classifiers is described in details as follows.
Features selection. In previous sections, 6 groups of features are introduced. They are ‘syn’, ‘5s’,
‘rv’, ‘cca’, ‘win3’, and ‘pos’ features. All the combinations of these 6 groups of features are im-
plemented in this experiment.
Training method. The SVM (Linear, RBF, and Polynomial) methods and the Naïve Bayes
(Gaussian, Multinomial, and Bernoulli) methods are used in this experiment.
Training implementation. The total number of implementation variations turns out to be
(26 − 1) ∙ 6 = 378
Validation method. The standard 10-fold cross-validation is applied here.
Training set. For the subjectivity classifier, 1000 tweets, half of whose subjectivity is objective and
another half is subjective are selected from the manually labeled tweets.
For the polarity classifier, 1000 tweets, half of whose polarity is positive and another half is
negative are selected from the manually labeled subjective tweets.
Here, the polarity of the each training tweet is judged by three persons, and then a majority vote
is applied to finally decide the polarity of the tweet.
Test results. Top-7 test results of the subjectivity classifiers and the polarity classifiers are shown in
Table 5.1 and Table 5.2.
Table 5.1: Subjectivity Classifiers Performance
syn 5s rv win3 cca pos accurracy
74.7%
74.9%
75.8%
76.4%
76.5%
77.5%
78.4%
20 http://www.nltk.org/.
18
Table 5.2: Polarity Classifiers Performance
syn 5s rv win3 cca pos accurracy
82.2%
85.3%
87.2%
89.6%
89.9%
90.6%
91.1%
The left column represents the implementation where black circle marks indicate the adopted
features, and the right column shows the accuracy of the implementation. As shown in these tables,
the best-performed subjectivity classifier is obtained by the features combination of ‘syn’, ‘rv’,
‘win3’, and ‘pos’, with SVM polynomial training method, while the best-performed polarity clas-
sifier is obtained by the features combination of ‘rv’, ‘win3’, ‘cca’, and ‘pos’, with the SVM linear
training method. These two classifiers are adopted in the classification step for the whole spam
filtered Twitter set.
Here also gives several failure examples in the two sentiment steps, and these cases are further
analyzed so as to figure out the reason for the errors.
Subjectivity classification:
Case 1:
Tweet text: RT @t0shiba: Just a note from last night...Pizza Hut is good pizza...If you have dry
skin...Rub some Pizza Hut on it...Healed!
Manual label: subjective
Classification result: objective
Analysis: this text uses irony to express the negative sentiment, but it seems difficult for the ma-
chine to recognize such kind of sarcasm in a sentence.
Case 2:
Tweet text: If only some people knew how KFC got the chicken to you, some would rather starve
than ever eat KFC again! #SpeakUp
Manual label: subjective
Classification result: objective
Analysis: there is a comparison expression in this tweet, which is quite confusing for a classifier.
Case 3:
Tweet text: @MenHumor: There's a special place in hell for murderers and the guy who decided
what time breakfast ends at McDonalds.”
Manual label: objective
Classification result: subjective
Analysis: even humans cannot fully understand the meaning of the text, let alone for a machine.
Subjectivity classification:
Case 1:
Tweet text: @SamayoaMarissa: "@UberFacts: Burger King uses approximately 1/2 million
19
pounds of bacon every month." you pig killers )':” >:)
Manual label: negative
Classification result: positive
Analysis: the emoticons at the end of the text have clouded the judgment of the classifier.
Case 2:
Tweet text: Ewwwwwww! RT @DREADHEADNATI0N: Mcdonalds did me right… I could eat it
everyday! #fatgirlproblem
Manual label: positive
Classification result: negative
Analysis: because the proposed classifier also takes the commonly used hashtags into account, this
may sometimes lead to the incorrect result.
20
6. Analysis
6.1 Statistical Analysis
Based on the ‘list of restaurant chains’ on Wikipedia, 6 restaurants (i.e. McDonald’s, KFC, Burger
King, Pizza Hut, Subway, and Starbucks) that are located worldwide are chosen as the research
subject. After filtering the Twitter dataset with these restaurants’ names, the location definition
process is carried out, and 33 countries, as shown in Table 6.1, are selected as target countries. That
is, only tweets from these 33 countries and areas are remained to be processed by the spam filtering
step. While our original Twitter dataset amounts to 10 million, the size of this pre-filtered and
spam-filtered Twitter dataset has been reduced to approximately 2 million. This dataset becomes the
input of the later steps of sentiment classification.
Table 6.1: Target Countries and Their ISO 3166-1 Codes (Restaurant Domain)
United States (US), United Kingdom (GB), Australia (AU), Indo-
nesia (ID), Malaysia (MY), Canada (CA), Philippines (PH), Singa-
pore (SG), Brazil (BR), India (IN), South Africa (ZA), Japan (JP),
Mexico (MX), France (FR), Netherlands (NL), Greece (GR), Thai-
land (TH), China (CN), Russia (RU), Spain (ES), Argentina (AR),
Chile (CL), South Korea (KR), Germany (DE), Italy (IT), Ireland
(IE), Venezuela (VE), Colombia (CO), Poland (PL), Egypt (EG),
Ukraine (UA), New Zealand (NZ), Viet Nam (VN)
In this section, basic statistical analysis is taken out to obtain a general overview of these res-
taurants in the 33 countries. Table 6.2 lists out the amount of preprocessed tweets for each target
country. Figure 6.1 shows the distribution of tweets over the 6 restaurants in each country.
Table 6.2: Tweet Amount for Each Country (Restaurant Domain)
US 888,221 JP 119,721 KR 11,201
GB 114,513 MX 28,298 DE 10,187
AU 12,328 FR 51,124 IT 5,513
ID 105,397 NL 84,755 IE 8,683
MY 67,316 GR 72,084 VE 22,437
CA 118,117 TH 59,854 CO 6,718
PH 21,703 CN 59,903 PL 3,920
SG 23,864 RU 37,255 EG 4,069
BR 54,357 ES 39,424 UA 3,567
IN 11,306 AR 27,522 NZ 2,547
ZA 5,271 CL 24,512 VN 1,374
21
Figure 6.1: General Distribution of Tweets in Restaurant Domain
As shown in Table 6.2 and Figure 6.1, the following conclusions can be obtained.
a) There is a huge difference among the tweet amounts for the target countries, and tweets from
the United States predominate in quantity;
b) In different countries, the distribution of tweets over the 6 restaurants is quite different;
c) The overall distributions of the 6 restaurants are also quite biased that tweets for some restau-
rants such as Pizza Hut are much fewer than those for other restaurants;
d) These distributions may give information for the popularities of each restaurant in each coun-
try.
6.2 Basic Sentiment Analysis
After applying the optimal subjectivity classifier and polarity classifier described in Section 5.3,
the preprocessed Twitter dataset is divided into 3 polarity groups, i.e. positive, negative, and ob-
jective. Based on these 3-way classification results, a series of analyzing approaches are imple-
mented, and the concrete descriptions are presented in the subsections.
6.2.1 Polarity Distribution
To figure out the proportions of positive, negative, and objective tweets for each country and for
each restaurant, the polarity distribution graphs are plotted, as shown in Figure 6.2~6.7. The rose
color stands for the positive tweets, the azure color stands for the negative sentiment, and the
lemon yellow stands for the objective tweets.
Figure 6.2: Polarity Distribution for Burger King
22
Figure 6.3: Polarity Distribution for KFC
Figure 6.4: Polarity Distribution for McDonald’s
Figure 6.5: Polarity Distribution for Pizza Hut
Figure 6.6: Polarity Distribution for Starbucks
23
Figure 6.7: Polarity Distribution for Subway
From the above polarity distribution graphs, it can be figured out that:
a) For different restaurants, the general distributions of the positive, negative, and objective
tweets are fairly different. For instance, the proportion of the positive tweets for McDonald’s
obviously outstrip that of other restaurants, which may suggest that McDonald’s enjoys a bet-
ter reputation among people in the world level;
b) Objective tweets predominate in the amount for all the target restaurants, while positive
tweets outnumber negative tweets in general.
c) For the same restaurant, people from different countries seem to have quite different attitudes.
For example, in the case of McDonald’s, Indonesian people seem to favor the restaurant not as
much as American people do, since Indonesia has a larger percentage of negative tweets than
the United States, and also has a smaller percentage of positive tweets than the United States.
6.2.2 Sentiment Map
While the positive, negative, and objective tweet is given a polarity score of 1, -1, and 0 respectively,
the sentiment maps for the target restaurants are depicted in Figure 6.8~6.13. As for the gradient
color axis, green represents negative sentiment; and red represents positive sentiment.
Figure 6.8: Sentiment Score Map of McDonald’s
24
Figure 6.9: Sentiment Score Map of KFC
Figure 6.10: Sentiment Score Map of Burger King
Figure 6.11: Sentiment Score Map of Pizza Hut
25
Figure 6.12: Sentiment Score Map of Subway
Figure 6.13: Sentiment Score Map of Starbucks
By representing the sentiment by gradient color, the above sentiment maps demonstrate the
overall distributions of people’s opinions for the targets in the restaurant domain. Compared to the
polarity distribution graphs in the last subsection, these maps give more intuitive presentation of
the sentiment distribution and make it possible to consider the geographic elements as well.
6.2.3 Sentiment Keywords Extraction
As for the more specific reasons why people like or dislike a target, or the concrete characteristics of
a target that shape people’s attitudes, it remains unclear and needs further exploration. To this end,
the frequently occurred sentiment words, either positive or negative, are extracted with their fre-
quencies for each target restaurant, and the tag cloud is harnessed as a tool to describe these rep-
resentative sentiment keywords. Figure 6.14~6.19 give thee tag clouds for the targets. The white
background indicates the positive sentiment, and the black background indicates the negative sen-
timent. The size of the word denotes the occurrence frequency, and the multicolor of the word has no
special significance.
26
Figure 6.14: Tag Cloud of Sentiment Keywords for McDonald’s
Figure 6.15: Tag Cloud of Sentiment Keywords for KFC
Figure 6.16: Tag Cloud of Sentiment Keywords for Burger King
27
Figure 6.17: Tag Cloud of Sentiment Keywords for Pizza Hut
Figure 6.18: Tag Cloud of Sentiment Keywords for Subway
Figure 6.19: Tag Cloud of Sentiment Keywords for Starbuck
From these tag clouds, we may obtain some clues of the reasons for people’s likes or dislikes of
the target restaurants. However, as it can be figured out in both positive and negative tag clouds, not
so much specific information can be acquired due to the big overlap of vocabulary among the
different targets. Thus, to compensate for this deficiency, (Attribute, Value) pairs are used to de-
28
scribe the target restaurants, which will be introduced in the next subsection.
6.2.4 (Attribute, Value) Pairs Extraction
Based on the Stanford dependency trees obtained in the sentiment classification step, we select out
the sentiment expressive word pairs (explained in Section 4.5.7), each of which typically but not
restrictedly consists of one noun (attribute) and one adjective (value), to construct the (Attribute,
Value) list for each target. Table 6.3~6.8 give parts of the (Attribute, Value) lists of the target res-
taurants. Red color and green color represent positive and negative sentiment respectively. Numbers
following the value words denote frequencies.
Table 6.3: (Attribute, Value) list of McDonald’s
Attribute value
mcdonalds new 33377, good 3992, fat 3055, great 1533, best 1525, bad 1519, big 1210, commercial 913,
happy 907, large 661, better 743, delicious 637, fresh 485, nasty 384, american 379, healthy
363, nice 331, fast 326, yummy 315, bagged 307, expensive 298, different 297, small 280,
mcgorgeous 74, sonic 233, unhealthy 183, worst 162, funny 156, stupid 151, packaged
150, greatest 148, poor 148, favorite 147, perfect 145, beautiful 143, ill 132, regular 120,
romantic 119, terrible 116, slow 113, weird 102, greater 101, original 99, successful 99, quick
87, worse 75, greasy 69, horrible 69, instant 68, awful 66, biggest 65, huge 63, famous 61,
special 60, busy 59, international 57, wonderful 57, healthier 54, top 53, cheaper 51, lucky 49,
desperate 48, fantastic 46, hilarious 46, bigger 45, classic 45, tasty 45, normal 44, common
43, creative 41, scary 40, standard 39, acceptable 35, nastiest 34, daily 34, dirty 33, fatty 33,
ridiculous 32, slowest31
food fast 1861, free 1263, chinese 482, great 451, good 359, best 302, worst 281, healthy 244,
favorite 216, new 151, leftover 133, unhealthy 118, delicious 104, bad 104, terrible 90, fat 82,
better 75, nasty 67, indigestible 60, nice 60, mexican 58, greasy 46, normal 45, regular 35,
asian 35, expensive 35, fresh 34, organic 34 ,lethargic 33, nutritious 31, awful 30, healthier
29, indian 27, filthy 27, healthiest 26, horrible 25
burger delicious 606, double 590, cheese 307, better 243, best 184, free 151, mcbusted 107, big 101,
good 92, large 45, fat 38, fish 33, nice 32, special 32, great 29, bad 28, disappointing 21,
small 17, expensive 17, huge 16, nasty 16
chicken real 153, good 143, large 115, fried 92, bad 61, best 60, cheese 58, grilled 52, big 37, french
31, fresh 29, better 22, crispy 21, small 19, garlic 19, hot 18, nasty 15, classic 14, delicious 14
meal happy 1574, free 475, big 456, large 323, whole 277, full 136, extra 115, unhappy 110, happi-
er 97, best 89, traditional 73, favorite 67, good 67, romantic 66, healthy 54, cheeseburger 38,
nice 33, great 30, breakfast 28, bad 22, worst 16,despicable 15, regular 15, small 15, delicious
14, terrible 13
breakfast good 2476, best 1178, big 742, nice 734, bad 679,great 439, perfect 418, full 330, nasty 206,
early 170, delicious 143, yummy 109, favorite 99, english 91, hot 89, healthy 76, poor 75, fat
64, fabulous 62, happy 55, better 47, wonderful 44, worst 32, quick 31
coffee free 3223, bagged 468, hot 453, small 428, packaged 263, large 211, good 210, best 185,
breakfast 58, worse 58, great 49, iced 48, black 45, bad 44, nice 37, delicious 29, worst 29,
better 27, awful 23, horrible 17
fries large 627, fresh 369, french 314, good 184,cheese 136, best 94, small 71, hot 66, cold 58,
great 39, greasy 32, big 29, nasty 28, favorite 27, yummy 21, delicious 16, famous 15
burger unfit 476, double 145, cheese 118, best 104, good 71, large 58, popular 37, expensive 31,
monthly 32, better 32, special 30, different 23, nasty 21, fat 18, favorite 16, big 15, delicious
15, mcdouble 15
29
cheeseburger mcdouble 3704, double 1291, extra 154, large 126, 50cent 29, small 28, big 26, good 18, bad
16, gigantic 15, best 15
mcflurry best 37, m&m 26, yummy 26, chocolate 24, delicious 23, great 22, small 19, good 18, iced 18,
kitkat 15
pie sweet 231, apple 54, hot 53, delicious 46, good 33, chocolate 29, spinach 26, large 23, bad 22,
best 21, cherry 16
pancake breakfast 69, good 66, best 37, chocolate 27, delicious 25, bad 24, chipotle 23, better 21,
blueberry 18, dry 18, hot 15, nasty 15, nice 15
frappe chocolate 142, breakfast 36, good 25, large 16, delicious 15
service full 399, great 67, worst 27,good 26, horrible 26, slow 23, terrible 21, wonderful 17, bad 17,
nice 16, slowest 16, smile 16
mcmuffin cheese 76, delicious 23, french 20, bad 20, english 18, better 15, breakfast 15
Table 6.4: (Attribute, Value) list of KFC
Attribute Value
kfc fried 1361, healthy 1059, new 913, original 634, zinger 450, great 295, commercial 263, best
240, poor 221, delicious 192, fresh 189, bad 178, famous 151, fat 149, better 138, nice 115,
worst 105, nasty 73, special 73, greasy 55, terrible 53, yummy 52, busy 39, perfect 38, favor-
ite 31, happy 29, american 28, expensive 28, nastiest 27, stupid 26, unhealthy 26, fastest 22,
mediocre 22, classic 20
kentucky fried 1889, great 442, favorite 652, top 304, poor 302, good 284, best 228, ridiculous 99,
national 92, bad 88, classic 75, favorite 59, special 59, international 37, fresh 34, nice 33,
professional 28, crazy 24, greatest 24
chicken fried 3290, original 610, poor 236, best 224, good 219, hot 90, real 89, worst 74, bad 61,
delicious 58, great 56, dry 45, terrible 45, greasy 40, cheese 38, fat 35, fresh 32, cold 29, nice
29, small 27, nasty 25, buttered 19, clean 19, famous 18, healthy 18, , artificial 18, favorite
16, finest 15, fry 15, crunchy 14
food great 1659, fast 287, good 136, chinese 101, favorite 90, best 88, healthy 74, unhealthy 43,
worst 40, delicious 34, bad 28, nastiest 25, fried 16, unusual 15, nice 15, yummy 15
dinner unhealthy 295, special 40, roast 31, nice 28, best 26, delicious 20, romantic 19, full 17, good
16, big 15, great 15, healthy 15
lunch good 42, healthy 27, big 25, special 20, best 18, great 16, happy 16
burger double 93, zinger 57, fish 28, good 27, best 27, cheese 20, bad 17, fat 17, hot 17
meal free 150, good 101, big 87, best 66, additional 41, large 32, zinger 27, hot 17, favorite 16,
romantic 16, full 16, gravy 16, great 16, delicious 15, fried 15, nice 15
wings hot 166, best 19, big 19, chicken 17, large 16, zinger 16, good 15, crispy 15, dry 15
fries lontong 356, cheese 73, hot 57, large 46, french 28, cheesy 26, best 25, gravy 22, healthy 21,
weird 21, bad 20, delicious 20, fat 16, good 16, tasty 15, yummy 15
chips fish 58, gravy 37, cheese 25, best 19, delicious 18
service public 54, national 35, terrible 26, early 23, bad 19, great 18, horrible 18, slow 17
Table 6.5: (Attribute, Value) list of Burger King
Attribute Value
burger breakfast 1029, new 735, good 696, better 450, big 304, free 294, commercial 281, nasty 188,
bad 167, double 157, best 156, , cheese 105, fat 88, french 59, great 57, nice 54, original 44,
expensive 38, trash 36, delicious 34, giant 33, weird 31, funny 30, horrible 30, awful 29,
chipotle 28, fish 28, hot 27, huge 26, small 26, special 25, standard 22, worst 22, $5 19,
roasted 19, terrible 18, top 18, fresh 18
food fast 328, worst 152, chinese 88, best 75, great 63, favorite 58, good 45, nastiest 29, bad 28,
slowest 24, american 16, nasty 16
restaurant unhealthiest 61, fast 4, favorite 4, net 3, successful 3
30
breakfast best 103, good 56, full 26, bad 18, free 18, great 18, better 15, big 15, delicious 13, english
12, nasty 12, nice 12, sonic 12
chicken original 88, mad 67, good 56, large 30, best 28, commercial 26, pure 20, fried 16, new 16, $5
14, bad 14, cheese 12, horrible 12
burgerking good 72, new 42, cheese 26, nice 22, best 20, fresh 20, great 20, bad 18, better 18, delicious
17, favorite 17, silent 17, daily 15, funny 15, large 15, stupid 13, yummy 13
sandwich original 84, new 73, big 58, fish 46, cheese 27, double 21, good 20, authentic 19, breakfast 19,
$1 13, american 13, chicken 12, horrible 12
meal free 41, large 30, big 28, romantic 21, best 17, scrumptious 15, hot 14, delicious 13, bad 12,
dutch 12, good 12, happy 11, special 11, vegetarian 10
place first 42, best 30, good 27, favorite 26, slowest 24, nastiest 22, worst 20, fast 16, new 15, nice
15, great 12, grim 10, ridiculous 10
menu impactful 58, favorite 26, detailed 24, new 19, whole 15, best 12
bacon double 156, cheese 61, extra 15, great 15, large 14, fat 14, fresh 11, fried 11, greasy 11
service worst 43, terrible 26, slow 22, good 16, horrible 16, slowest 15, best 12
taste good 55, better 30, great 25, bad 23, horrible 22, weird 18, different 12, similar 12, wrong 10
Table 6.6: (Attribute, Value) list of Pizza Hut
Attribute Value
pizza large 1877, best 1228, good 1025, commercial 924, new 898, stuffed 734, cheese 465, bad
454, favorite 429, general 385,crust 375, fresh 300, big 265, hot 228, better 212, great 186,
viral 174, delicious 166, full 155, nasty 154, wrong 149, garlic 136, nice 129, classic 103,
worst 98, cheesy 77, fried 74, italian 60, hawaiian 59, greasy 58, cold 46, fat 46, special 46,
cheaper 44, different 44, chicken 42, regular 35, $25 32, cheesestuffed 31, biggest 30, terrible
30, trashy 30
pizzahut big 143, national 88, new 87, good 69, delicious 56, great 54, best 44, commercial 20, fat 20,
favorite 19, large 19, stupid 19, bad 18, happy 18, hawaiian 17, healthy 15, slow 15
dinner single 132, big 129, good 36, lovely 26, delicious 25, great 20, romantic 18, best 18, favorite
17, $10 17, happy 15
wings hot 442, good 141, best 55, garlic 40, chinese 29, chicken 28, cheese 25, 50cent 23, asian 23,
better 22, bad 22, boneless 19, delicious 19, hottest 16, small 15, traditional 15
crust stuffed 1020, cheese 836, thin 150, great 134, best 124, large 120, good 86, cheesy 74, deli-
cious 29, cheesestuffed 25, regular 22, bad 21, $12 18, fabulous 17, perfect 17, soft 16
delivery international 177, local 71, free 30, special 28, late 26, brilliant 24, available 21, fast 16, good
16
food great 255, chinese 143, international 103, free 90, good 52, terrible 32, favorite 30, fast 28,
best 26, cold 26, bad 20, delicious 19, italian 19, organic 18, worst 17, disgusting 15, fried 15,
hot 15
pepperoni stuffed 90, large 69, cheese 55, crust 27, double 25, thin 25, hot 23, italian 16, best 15
sticks cheese 165, cinnamon 29, best 23, bread 23, good 18, hot 17, garlic 16, yummy 16
service bad 83, worst 46, terrible 31, horrible 27, great 22, slow 21, awful 21, good 19, poor 17, best
15, fantastic 15
chicken fried 74, best 26, garlic 25, french 24, delicious 20, grilled 15, hawaiian 15
Table 6.7: (Attribute, Value) list of Subway
Attribute Value
subway great 2899, new 1771, good 1519, fresh 1068, best 983, delicious 709, breakfast 590, cheese
369, favorite 251, bad 227, commercial 200, healthy 193, crowded 149, yummy 146, fat 135,
nice 124, better 116, eatfresh 105, greatest 104, big 99, worst 90, sonic 87, scary 86, garlic 75,
chipotle 64, weird 60, nasty 59, top 56, daily 51, special 51, stupid 43, green 38, grumpy 36,
adorable 33, tame 31, expensive 30, funny 30, horrible 30, packed 29, $5 28, cold 28, fast 25,
31
perfect 24, terrible 24, wonderful 23
cookies best 747, good 434, chocolate 187, fresh 57, delicious 56, great 45, breakfast 35, nice 31, bad
29, perfect 29, famous 25, m&m 25, raspberry 24, fat 22, soft 21, hard 18, top 17, daily 16,
favorite 16, terrible 15, oatmeal 14, sweet 14, tasty 14, healthy 13, wonderful 13, yummy 13
sandwich best 391, new 299, delicious 243, national 181, good 167, favorite 165, victorious 87, cheese
56, great 48, nice 46, whole 45, big 43, bad 36, hot 33, worst 32, better 31, breakfast 30,
healthy 30, tuscan 30, vegetable 30, different 29, fat 27, italian 24, $5 19, finest 19, giant 19,
cold 18, huge 18, american 17, fresh 17, indian 16, nastiest 16, toasted 16, weird 15, expen-
sive 15, flatbread 13, garlic 13, greatest 13
lunch good 70, healthy 57, best 54, nice 40, delicious 29, fresh 20, romantic 17, full 15, perfect 13,
quick 13, special 12, wonderful 12
food chinese 361, healthy 182, great 168, fast 157, good 150, best 86, favorite 75, mexican 27,
greasy 22, fresh 21, indian 20, nice 20, asian 19, delicious 17, different 16, fatty 16, spanish
16, bad 15, better 14, expensive 13, horrible 12, overrated 12, unhealthy 12
chicken sweet 121, tuscan 109, delicious 103, cheese 88, footlong 74, double 45, fresh 42, good 30,
italian 27, garlic 19, flat 18, nice 16, fried 15, great 15, roast 15, steak 14
breakfast good 99, healthy 34, best 30, bad 27, better 20, great 20, awful 18, balanced 13, delicious 13,
english 12, favorite 11
bread cheese 146, garlic 114, flat 88, italian 86, fresh 47, white 43, american 34, hot 31, good 29,
stale 28, best 22, jalapeño 21, great 20, delicious 16, bacon 14, healthy 13, meat 12, salad 11,
soft 10
salad best 28, good 24, breakfast 20, healthy 19, italian 17, cheese 15, egg 15, fresh 14, chopped 12
meal stupid 106, best 46, good 31, full 28, healthy 20, romantic 17, great 15, bad 13, big 13, mis-
erable 12
cheese extra 88, swiss 35, fat 31, pepper 31, steak 28, italian 23, flat 19, best 15, white 15, jalapeño
14, yellow 13
service great 29, worst 24, bad 18, better 17, horrible 17, normal 14, rude 14
Table 6.8: (Attribute, Value) list of Starbucks
Attribute Value
starbucks great 3648, new 3319, good 3093, favorite 2536, better 1567, best 1503, delicious 1201,
yummy 749, poor 733, green 678, topshop 626, nice 601, bad 597, perfect 569, happy 434,
big 341, fresh 302, expensive 295, regular 270, sophisticated 267, noble 255, economical 243,
beautiful 224, different 194, special 192, daily 181, original 165, worst 149, cheaper 146,
horrible 141, global 126, nasty 102, greatest 91, wonderful 89, busy 78, reusable 77, super 75,
ridiculous 75, creative 74, fat 72, healthy 71, weird 67, popular 66
coffee good 1494, best 988, favorite 845, hot 639, , expensive 431, black 295, great 245, bad 231,
breakfast 218, exploitative 201, delicious 200, nice 189, poor 151, iced 124, healthy 104, cold
99, fresh 93, instant 88, terrible 87, packaged 74, nasty 73, normal 68, yummy 65, daily 53,
different 52, special 52, worst 49, horrible 39,classic 34, overpriced 33
drink favorite 3870, free 3646, wrong 965, hot 729, best 456, good 369, seasonal 283, cold 182,
expensive 99, delicious 89, special 72, nice 61, cuddle 57, complimentary 42, mineral 36,
chocolate 30, great 24, popular 23
tea green 3398, hot 344, bubble 297, black 217, sweet 181, good 108, best 85, iced 56, great 42,
favorite 35, nice 27, breakfast 25, herbal 18, nonfat 18, red 17, poor 17, bad 17, chamomile
16, classic 16, daily 16, refresh 16
barista favorite 163, cute 152, best 74, temporary 62, friendly 38, good 26, happy 22, attractive 16,
beautiful 15,rude 15, certified 15
latte delicious 104, french 76, good 65, hot 58, brûlée 57, yummy 46, chocolate 45, best 44, favor-
ite 35, breakfast 28, great 27, nonfat 18, fat 18, iced 18, nice 16
32
mocha white 1502, crumble 324, chocolate 260, delicious 101, salted 88, hot 74, best 67, peppermint
66, good 57, favorite 44, , great 29, yummy 20, bad 20, iced 18, nice 18, perfect 17, white-
chocolate 17
menu new 124, whole 55, best 26, entire 24, seasonal 20, daily 19, winter 15
cake chocolate 70, cheese 59, marble 45, good 31, best 28, lemon 25, classic 24, new 24, bad 23,
complimentary 23, fetid 19, sweet 19, birthday 18, crumble 18, delicious 17, fat 16, favorite
16, festive 16, great 15, healthy 15, nice 15, obnoxious 15, truffle 15
place better 486, favorite 94, great 84, best 67, good 51, special 38, expensive 34, historic 24, nice
24, quiet 24, overrated 23, wonderful 23, exclusive 22, greatest 18
milk chocolate 82, nonfat 41, hot 30, almond 24, bad 22, fat 22, delicious 20, fresh 19, best 18,
classic 16, diabetic 16, good 16, latte 15, allergic 15, bubble 15
frappe green 464, chocolate 72, crumble 67, white 42, delicious 25, hot 25, whipped 20 brûlée 18,
caramel 18, cotton 17, good 17, great 17, berry 16, exclusive 16
cookie crumble 466, chocolate 68, big 39, dough 25, latte 25, cute 23, good 23
, delicious 22, favorite 22, frosted 16, ginger 15, great 15, perfect 15
gingerbread latte 282, good 41, delicious 34, favorite 26, yummy 20, best 18, seasonal 18
taste good 291, better 72, bitter 61, burnt 59, bad 48, great 43, different 29, heaven 27, wonderful
22, awful 21, nice 21, new 20, alien 20, delicious 20, horrible 16, best 15, rich 15, weird 15,
nasty 15, perfect 15, special 15, strong 15
donut waffle 157, chocolate 88, good 35, best 25, breakfast 25, cheese 24, fresh 24, great 24, deli-
cious 19, krispy 19, sweet 18, swiss 16
frappuccino crumble 53, chocolate 42, delicious 29, yummy 28, favorite 23, english 22, best 21, good 21,
white 20, caramel 18, wonderful 18, crème 15, strawberry 15
service active 149, great 101, best 25, terrible 21, full 21, good 20, horrible 19, greatest 19, slow 17,
awful 17, global 17, quick 16, slowest 16
cappuccino normal 137, good 28, better 27, hot 25, nice 18, best 18, french 16, nonfat 16, fat 15
wifi free 413, fast 74, unlimited 24, great 23, slow 19, poor 18
The above (Attribute, Value) lists contain abundant information of specific characteristics of the
target restaurants, and from these lists, it is relatively easy to figure out the detailed and concrete
reasons to explain people’s opinions for the restaurants. For instance, the ‘Attribute’ column exacts
out some particular menu or service of the restaurant, and the ‘Value’ column gives the features of a
product, and the sentiment descriptive words for the product.
6.3 Culture-based Analysis
As one of the main objectives of this research, the relationship between the user evaluations for
global restaurants and cultural background is taken as the analysis subject in this section.
Based on the 6 restaurants’ scores for each country, the k-means method is applied to cluster the
target 33 countries into several groups. Here, k is empirically set as 2~10, and figure 6.20~6.28
shows the world map based on the corresponding clustering results. Countries filled with the same
color are from the same cluster.
33
Figure 6.20: Clustering Result Map (k=2)
Figure 6.21: Clustering Result Map (k=3)
Figure 6.22: Clustering Result Map (k=4)
34
Figure 6.23: Clustering Result Map (k=5)
Figure 6.24: Clustering Result Map (k=6)
Figure 6.25: Clustering Result Map (k=7)
35
Figure 6.26: Clustering Result Map (k=8)
Figure 6.27: Clustering Result Map (k=9)
Figure 6.28: Clustering Result Map (k=10)
36
Upon observing the changing process of the clustering result, we have following information.
a) Most English-speaking countries, as well as most non-English-speaking European countries,
are in the same cluster while k is set as 2;
b) While k is set as 3, most non-English-speaking Asian countries form a group;
c) While k is set as 4, Italy forms a separate group, which suggests Italian people may have quite
different opinions towards these restaurants, or some limitations have contributed to this re-
sult;
d) While k is set as 5, main South American countries form a separate group, which possibly
reflects the location-based cultural effects;
e) While k is set as 6, non-English-speaking East and southeast Asian countries form a separate
group, which may also reflect the location-based cultural effects;
f) While k is set as 7, Spain and Mexico form a separate group, which may reflect the lan-
guage-based cultural effects;
g) While k is set as 8, a few European countries form a separate cluster, which suggests they
share more similar attitudes towards the target restaurants, compared to North American
countries and English speaking countries in other areas;
h) While k is set as 9, RU and UA, TH and VN become two separate clusters, which reflects the
location-based cultural effects;
i) While k is set as 10, CO and VE, EG and ZA become two separate clusters, which may also
demonstrate the location-based cultural effects.
While only focusing on the k=10 situation, the10 clusters turns out to be
US, CA, PH, SG, DE, AU, IN, NZ;
JP, ID, KR;
ES, MX;
IT;
RU, UA;
GB, NL, FR, GR, CN, IE, PL;
MY, BR, AR, CL;
TH, VN;
EG, ZA;
CO, VE.
Based on this clustering result, the following conclusions can be drawn.
a) The location-based cultural effects are quite obvious. For example, the cluster of BR, CL, AR,
the cluster of RU, UA, the cluster of ZA, EG, the cluster of TH, VN, and the cluster of most
of the Western European countries, have been clustered into the same cluster according to
their location and basic cultural background;
b) Some of the English-speaking Asian countries are clustered into the same group with North
American countries, which suggests that the language-based cultural background may have
some effect;
c) Comparing to most of the European countries, some countries, such as ES and IT, seem to
have quite different opinions for these restaurants, which may suggest that they have special
attitudes considering the food culture.
d) However, some confusing results still exist. For example, CN is clustered into the Western
European cultural background group, and MY is clustered into the South American cultural
37
background group. These confusing results may be explained by other effective elements ex-
cept for general cultural background, such as the eating patterns, the brand reputation, mar-
keting strategies, and locally specialized products and services.
e) Limitations of the experiment, such as the fact that only fast food restaurants are taken as
targets, may also contribute to the unexpected results.
38
7. Field Expansion
From the experiment and results of sentiment analysis in the restaurant domain, it can be seen that
the proposed approach is quite promising for this kind of analysis, and informative conclusions
considering the food culture have been drawn by carrying out the analysis. However, due to the
restriction to a special field, the representativeness and the transferability of the approach are still
unclear and should be further testified. To this end, in this section, the proposed sentiment analysis
approach is applied to the travel domain. The basic methods and experiment steps stay the same,
except that the dictionaries and datasets are reconstructed, and all the classifiers are retrained based
on the travel related data.
7.1 Data
2,113,624 travel related tweets (from Sep. 2013 to Dec. 2013) and 42,769 travel related reviews are
collected as Twitter dataset and review dataset respectively. As for the collection of Twitter data, the
names of 12 world attractions (i.e. Great Wall of China, Mountain Fuji, Matterhorn, Sydney Opera
House, Statue of Liberty, Colosseum, Louvre Museum, Grand Canyon, Machu Picchu, Angkor Wat,
Eiffel Tower, Taj Mahal), along with a list of travel related keywords which are selected according to
the occurrence frequency in the review dataset, are taken as the filtering condition. The target
languages are the same as before, and the target countries and their corresponding codes are listed in
Table 7.1. Totally 34 languages and 50 countries are taken into consideration.
Table 7.1: Target Countries and Their ISO 3166-1 Codes (Tourism Domain)
United States (US), United Kingdom (GB), Australia (AU), Indonesia (ID),
Malaysia (MY), Canada (CA), Philippines (PH), Singapore (SG), Brazil
(BR), India (IN), South Africa (ZA), Japan (JP), Mexico (MX), France (FR),
Netherlands (NL), Greece (GR), Thailand (TH), China (CN), Russia (RU),
Spain (ES), Argentina (AR), Chile (CL), South Korea (KR), Germany (DE),
Italy (IT), Ireland (IE), Venezuela (VE), Colombia (CO), Poland (PL), Egypt
(EG), Viet Nam (VN), Salvador (SV), Slovenia (SI), Sweden (SE), Panama
(PA), Norway (NO), Saudi Arabia (SA), Latvia (LV), Kazakhstan (KZ),
Kuwait (KW), Cambodia (KH), Greenland (GL), Estonia (EE), Ecuador
(EC), Denmark (DK), Czech (CZ), Switzerland (CH), Bulgaria (BG), Bel-
gium(BE), Austria (AT)
7.2 Experiment
As for the spam filtering step, a spam classifier with performance of 92.5% accuracy is trained. It is
used to filter the original Twitter dataset and discard 7.8% ‘spam’ tweets.
In the sentiment classification step, all the combinations of the previously proposed 6 features are
applied to train the subjectivity classifier and the polarity classifier. After the 378 implementations
of the experiment, Top-7 test results of the subjectivity classifiers and the polarity classifiers are
shown in Table 7.2 and Table 7.3.
39
Table 7.2: Subjectivity Classifiers Performance
syn 5s rv win3 cca pos accurracy
79.7%
74.9%
81.7%
82.4%
83.0%
83.1%
84.3%
Table 7.3: Polarity Classifiers Performance
syn 5s rv win3 cca pos accurracy
82.2%
89.5%
91.2%
93.6%
94.3%
94.9%
96.4%
As it can be figured out in the above tables, the best-performed subjectivity classifier (with an
accuracy of 84.3%) is obtained by the features combination of ‘syn’, ‘5s’, and ‘rv’, with SVM RBF
training method, while the best-performed polarity classifier (with an accuracy of 96.4%) is ob-
tained by the features combination of ‘rv’, ‘5s’, ‘win3’, and ‘cca’, with the SVM polynomial
training method. These two classifiers are used to sequentially classify all the travel related Twitter
data into positive, neutral, and negative groups, and give each tweet in these groups a sentiment
score of 1, 0, or -1.
Comparing the performances of the classifiers in tourism domain and restaurant domain, it can
be found that despite the same feature combinations, the best-performed spam classifier for res-
taurant domain achieves higher accuracy than that for tourism domain, and both best-performed
subjectivity classifier and polarity classifier for tourism domain outperform their counterparts for
restaurant domain. These disparities may adequately demonstrate the difference between data of
the two domains, and also lend credit to the necessity and significance of using domain-exclusive
data for training and test tasks in basic and expansion experiments.
7.3 Analysis
7.3.1 Statistical Analysis
First, basic statistical analysis is taken out to obtain a general overview of these attractions in the 50
countries. Table 7.4 lists out the amount of preprocessed tweets for each target country. Figure 7.1
shows the distribution of tweets over the 12 attractions in each country.
40
Table 7.4: Tweet Amount for Each Country (Tourism Domain)
US 155,150 CN 10,002 PA 254
GB 25,733 RU 10,727 NO 227
AU 6,661 ES 14,150 SA 2,360
ID 37,361 AR 8,387 LV 243
MY 8,369 CL 4,689 KZ 267
CA 16,023 KR 2,867 KW 1,237
PH 2,932 DE 2,686 KH 203
SG 4,639 IT 4,171 GL 5,489
BR 12,879 IE 1,798 EE 208
IN 8,016 VE 8,576 EC 11,955
ZA 1,737
CO 2,652
DK 305
JP 21,512 PL 445 CZ 266
MX 5,832 EG 1,527 CH 953
FR 29,012 VN 201 BG 279
NL 19,885 SK 378 BE 1,285
GR 21,021 SI 199 AT 396
TH 28,022 SE 855
Figure 7.1: General Distribution of tweets in Tourism Domain
From the above table and distribution graph, it can be concluded that:
a) Similar with the restaurant domain, there is a great difference among the tweet amounts for the
target countries, and tweets from the United States predominate in quantity;
b) In different countries, the distribution of tweets over the 12 attractions is quite different;
c) The overall distributions of the 12 attractions are also quite biased that tweets for some attrac-
tions such as Eiffel Tower are much more than those for other attractions;
d) These distributions may give information for the popularities of each attraction in each country.
For example, tweets about Angkor Wat in Cambodia (KH) and Viet Nam (VN) are evidently
more those in other countries, which may indicate that Angkor Wat is more popular with
Cambodian and Vietnamese than with people from other parts of the world.
41
7.3.2 Basic Sentiment Analysis
7.3.2.1 Polarity Distribution
To figure out the proportions of positive, negative, and objective tweets for each country and for
each attraction, the polarity distribution graphs are plotted, as shown in Figure 7.2~7.13. The rose
color stands for the positive tweets, the azure color stands for the negative sentiment, and the lemon
yellow stands for the objective tweets.
Figure 7.2: Polarity Distribution for Great Wall of China
Figure 7.3: Polarity Distribution for Mount Fuji
Figure 7.4: Polarity Distribution for Matterhorn
42
Figure 7.5: Polarity Distribution for Sydney Opera House
Figure 7.6: Polarity Distribution for Statue of Liberty
Figure 7.7: Polarity Distribution for Colosseum
Figure 7.8: Polarity Distribution for Louvre Museum
43
Figure 7.9: Polarity Distribution for Great Canyon
Figure 7.10: Polarity Distribution for Machu Picchu
Figure 7.11: Polarity Distribution for Angkor Wat
Figure 7.12: Polarity Distribution for Eiffel Tower
44
Figure 7.13: Polarity Distribution for Taj Mahal
From the above polarity distribution graphs, conclusions can be obtained that:
a) For different attractions, the general distributions of the positive, negative, and objective tweets
are quite different. For example, by and large Sydney Opera House owns more positive and
fewer negative tweets than Statue of Liberty, which indicates an overall better attitude towards
Sydney Opera House.
b) Objective tweets have a larger amount than positive and negative tweets in almost all the cases,
and for all the target attractions, positive tweets outnumber negative tweets by a large edge;
c) For the same attraction, people from different countries seem to have various opinions. For
example, as for Eiffel Tower, Italian people seem to have more complaints than people from
other countries, since Italy possesses more negative tweets than other countries.
7.3.2.2 Sentiment Map
After calculating the average sentiment score for each country, the sentiment maps for the target
restaurants are depicted in Figure 7.14~7.25. As for the gradient color axis, green represents nega-
tive sentiment; and red represents positive sentiment.
Figure 7.14: Sentiment Score Map of Great Wall of China
45
Figure 7.15: Sentiment Score Map of Mount Fuji
Figure 7.16: Sentiment Score Map of Matterhorn
Figure 7.17: Sentiment Score Map of Sydney Opera House
46
Figure 7.18: Sentiment Score Map of Statue of Liberty
Figure 7.19: Sentiment Score Map of Colosseum
Figure 7.20: Sentiment Score Map of Louvre Museum
47
Figure 7.21: Sentiment Score Map of Great Canyon
Figure 7.22: Sentiment Score Map of Machu Picchu
Figure 7.23: Sentiment Score Map of Angkor Wat
48
Figure 7.24: Sentiment Score Map of Eiffel Tower
Figure 7.25: Sentiment Score Map of Taj Mahal
The above sentiment maps demonstrate the overall distributions of people’s opinions by repre-
senting the sentiment by gradient color. Compared to the polarity distribution graphs, this form of
sentiment presentation is more intuitive and gives an image of geographical relationship of the
target countries.
7.3.2.3 Sentiment Keywords Extraction
Similar with the analysis in restaurant domain, besides the overall distributions of people’s
opinions for the target attractions, we want more information about specific reasons why people like
or dislike a target, and the concrete characteristics of a target that form people’s opinions. To achieve
this objective, the frequently occurred sentiment words, either positive or negative, are extracted
with their frequencies for each target attraction, and the tag cloud is used to describe these repre-
sentative sentiment keywords. Figure 7.26~7.37 give the tag clouds for the targets. The white
background indicates the positive sentiment, and the black background indicates the negative sen-
timent. The size of the word denotes the occurrence frequency, and the multicolor of the word has no
49
special significance.
Figure 7.26: Tag Cloud of Sentiment Keywords for Great Wall of China
Figure 7.27: Tag Cloud of Sentiment Keywords for Mount Fuji
Figure 7.28: Tag Cloud of Sentiment Keywords for Matterhorn
50
Figure 7.29: Tag Cloud of Sentiment Keywords for Sydney Opera House
Figure 7.30: Tag Cloud of Sentiment Keywords for Statue of Liberty
Figure 7.31: Tag Cloud of Sentiment Keywords for Colosseum
51
Figure 7.32: Tag Cloud of Sentiment Keywords for Louvre Museum
Figure 7.33: Tag Cloud of Sentiment Keywords for Great Canyon
Figure 7.34: Tag Cloud of Sentiment Keywords for Machu Picchu
52
Figure 7.35: Tag Cloud of Sentiment Keywords for Angkor Wat
Figure 7.36: Tag Cloud of Sentiment Keywords for Eiffel Tower
Figure 7.37: Tag Cloud of Sentiment Keywords for Taj Mahal
Compared with the tag clouds of the restaurant domain, the tag clouds for world attractions seem
to be more informative and meaningful. For instance, words like ‘famous’, ‘masterpiece’, ‘renais-
sance’, ‘worthy’, ‘treasure’, ‘gorgeous’ are particular or representative for the positive aspect of
Louvre Museum, and words like ‘crowded’, ‘dirty’, ‘boring’, ‘confusing’ may relate to the negative
53
aspect of Louvre Museum. Also, as for Mount Fuji, the positive features can be described by
‘beautiful’, ‘clear’, ‘milky’, ‘blossom’, ‘picturesque’, and ‘fresh’, while the negative descriptions
include words like ‘suicide’, ‘cold’, ‘dangerous’, ‘frozen’, and ‘invisible’. Based on these special
keywords, we may easily obtain some important hints or underlying facts for the pros and cons of
the target attractions. For example, words such as ‘battle’, ‘fight’, ‘blood’, ‘death’, ‘brutality’,
‘beast’, and ‘barbaric’ in the negative keyword set of Colosseum lend sufficient support to the
inference that the trip to Colosseum may remind tourists of the cruel history of Colosseum in An-
cient Roma. Also, people think Eiffel Tower is a romantic, gorgeous place, so when referring to the
Tower, they use words like ‘romantic’, ‘lover’, ‘illuminating’, ‘kiss’, ‘dream’, ‘sparkling’, and
‘splendor’.
7.3.2.4 (Attribute, Value) Pairs Extraction
Besides the above tag clouds, we still want more detailed information and have a closer observation
over these world attractions. Similar with the step in restaurant domain, the sentiment expressive
word pairs are extracted, each of which typically but not restrictedly consists of one noun (attribute)
and one adjective (value). Table 7.5~7.16 give the (Attribute, Value) lists of the target attractions.
Red color and green color represent positive and negative sentiment respectively. Numbers fol-
lowing the value words denote frequencies.
Table 7.5: (Attribute, Value) list of Great Wall of China
Attribute Value
greatwall first 61, real 53, white 38, chinese 33, old 27, long 26, famous 21, positive 14, free 13, high
13, large 13, artificial 11, good 11, visible 11, longest 10, useful 9, best 9, international 8,
largest 8, particular 8, northern 7, big 7, chángchéng 6, robust 6, ancient 6, beautiful 6, enor-
mous 5, greater 5, manual 5, north 5, fantastic 5, favorite 5, huge 5, original 4, technical 4
space visible 256, great 102, real 6, international 5
cemetery longest 413, earth 412, largest 23
wonder new 24, great 10, ancient 5, world 5
photos great 342, light 32, pile 14, famous 6, favorite 5
building great 190, visible 12, biggest 9, long 7, high 7, chinese 5, largest 5
length great 79, total 27, full 18, entire 10, conventional 9, central 6
heritage cultural 21, great 8, eternal 7, mutual 6, desirable 5, famous 5
information useful 17, give 12, private 10, chinese 9, great 7, important 6, interesting 6
dynasty old 19, great 7, boundary 6, successive 6
walk great 101, lazy 21, whole 15, confident 10, difficult 7, entire 7, long 5, toughest 5
scenery beautiful 52, great 12, incredible 7
Table 7.6: (Attribute, Value) list of Mount Fuji
Attribute Value
mountfuji beautiful 282, japanese 59, best 41, favorite 32, highest 28, high 27, good 22, famous 19, flow-
er 18, cultural 18, visible 17, top 17, big 17, eggplant 16, shizuoka 15, scenic 15, clear 13, rich
13, sunny 12, clean 11, dangerous 10, green 10, majestic 10, nice 10, powerful 9, available 9,
black 8, environmental 8, gorgeous 8, milky 7, prefectural 7, special 7, white 6, disappointing
5, distant 5, great 5, large 5, natural 5, spectacular 5
forest suicide 3128, high 26, thin 26, snowy 14, beautiful 8, haunted 7, scary 7, special 5
cloud lenticular 38, top 10, famous 8, beautiful 6
mountain highest 155, beautiful 15, famous 9, japanese 8, immortal 6, big 5, good 5, high5, huge 5
54
view beautiful 31, great 11, clear 10, autumnal 9, good 9, panoramic 8, distant 6, beautiful 6, differ-
ent 5, wonderful 5, breathtaking 5
sky blue 11, beautiful 9, clear 8, special 6, clean 5
beauty majestic 12, natural 8, mystical 7, beautiful 6, great 6, fantastic 5
Table 7.7: (Attribute, Value) list of Matterhorn
Attribute Value
matterhorn italian 620, fliegner 146, zermattlive 36, thunder 34, national 26, beautiful 18, good 15,
gornegrat 12, best 11, famous 10, fantastic 10, mysterious 10, large 8, swiss 8, majestic 8,
wonderous 7, big 6, great 6, blue 6, classic 6, favorite 5, impressive 5, top 5
mountain magic 218, famous 19, everest 16, swiss 12, main 10, high 8, beautiful 6, big 6, fliegner 6
switzerland fantastic 17, beautiful 13, special 7
zermatt royal 16, beautiful 13, free 8, best 7, cardinal 7, glacier 7, glorious 5, good 5, great 4, marvel-
ous 4, special 4
alps swiss 17, highest 10, glacier 5, italian 5, european 5, great 5
view nice 13, great 8, bad 6, beautiful 6, classic 6, spectacular 5
photo vurtual 12, wonderous 7, amazing 6, best 5, magical 5, majestic 5
Table 7.8: (Attribute, Value) list of Sydney Opera House
Attribute Value
Sydney opera
house
new 380, vivid 60, beautiful 19, mobile 12, famous 22, conceptual 20, initial 18, iconic 89,
great 30, beautiful 11, new 9, open 9, spectacular 8, big 8, best 7, classic 6, epic 6, impressive
6, large 5, monumental 5, wonderful 5
fireworks new 386, spectacular 24, first 12, current 9, massive 8, open 8, anniversary 6, beautiful 6,
traditional 6
night beautiful 12, great 11, incredible 5, magical 5, special 5
concert famous 10, live 8, full 6, large 6
heritage cultural 13, immortal 5, unique 5
harbour beautiful 18, modern 9, new 6, iconic 5, light 5
building modern 19, classic 8, great 7, royal 7, shine 6, circular 5, iconic 5, important 5
show fantastic 12, final 10, new 8, beautiful 5, full 5, good 5, light 5
Table 7.9: (Attribute, Value) list of Statue of Liberty
Attribute Value
statue of
liberty
present 2476, new 589, original 302, high 179, snow 176, top 119, rain 85, available 77, black
65, major 64, european 44, big 42, small 39, beautiful 33, good 28, own 22, resemble 22,
visible 22, old 21, italian 18, real 16, classic 14, national 13, american 11, green 9, greatest 8,
nice 8, memorial 7, cultural 7, gorgeous 6, best 6, cute 6, great 6, famous 6, iconic 6 tall 6,
visible 5, incredible 5, french 5, huge 5, solid 5, commemorative 5, giant 5, manhattan 5, open
5, contemporary 5, modern 5, creative 5
torch good 79, original 10, green 6, impressive 6
view beautiful 24, top 13, great 10, nice 7, clear 5, gorgeous 5, manhattan 5
photo famous 113, unique 113, rare 10, historic 5, great 5, original 5
history sad 70, natural 10, american 6, various 6, real 5
park central 46, national 16, new 11, main 7, cultural 5, interesting 5, small 5
tour new 23, incredible 12, finest 8, boat 6, great 6
place wrong 15, best 10, dangerous 9, mythical 7, better 7, biggest 6, favorite 6, great 6, interesting
5, memorial 5
55
Table 7.10: (Attribute, Value) list of Colosseum
Attribute Value
coloseum roman 271, vatican 223, beautiful 42, iconic 27, famous 25, ancient 24, modern 21, good 21,
great 19, big 19, popular 18, huge 18, legendary 18, magnificent 17, special 16, eternal 16,
favorite 15, largest 15, bad 14, gorgeous 14, greatest 14, historic 14, spectacular 13, awesome
13, biggest 13, classical 12, immortal 12, impressive 12, incredible 12, large 11, wonderful 10
rome beautiful 63, ancient 47, vatican 26, archaeological 20, famous 16, historic 15, iconic 14, clas-
sic 14, eternal 12, great 10, incredible 10
city vatican 238, magical 27, big 25, dangerous 19, italian 16, beautiful 15, eternal 13, major 11,
bad 10, gorgeous 10
time first 135, next 57, long 37, great 21, roman 16, greatest 15, free 12, possible 12, ancient 11,
considerable 10
heritage best 34, mutual 33, cultural 11, historical 11
place different 29, favorite 26, good 22, beautiful 18, classical 15, lucky 14, spectacular 14, bad 12,
big 12, interesting 11
emperor ancient 28, stupid 22, roman 16, ephemeral 11, pragmatic 10, rich 10, vespasian 8
monument famous 19, historic 15, iconic 15, european 13, architectural 11, beautiful 9
building dangerous 14, previous 13, impressive 12, mediterranean 12, roman 10, cathedral 9, favorite 8
Table 7.11: (Attribute, Value) list of Louvre Museum
Attribute Value
Louvre
museum
great 201, marble 104, famous 82, national 75, unusual 74, beautiful 66, major 27, good 26,
pyramid 25, large 25, cultural 25, big 18, immersive 17, largest 16, best 15, wide 14, french
12, original 12, spectacular 12, cathedral 11, biggest 11, special 10, majestic 10, exclusive 9,
favorite 9, iconic 8, perfect 8, wonderful 8, nice 8, imaginary 7, incredible 7 gorgeous 7, huge
6, interesting 6, majestic 5, important 5, natural 5, royal 5, bad5
paris beautiful 29, famous 19, good 17, favorite 14, cultural 12, documentary 10, worth 9, cathedral
8, cool 7, incredible 6
photo beautiful 11, romantic 10, cute 10, classic 7, royal 7, incredible 6
art islamic 30, famous 16, important 15, beautiful 14, contemporary 14, modern 13, real 12, great
12, western 10, academic 10, asian 10, classic 10, conceptual 9, eastern 9, egyptian 9, incredi-
ble 9, national 8, religious 8, superior 6, worthy 6
painting famous 305, italian 25, european 15, royal 14, beautiful 8, favorite 6
place secure 177, beautiful 23, great 22, special 20, good 19, interesting 17, favorite 15, best 11,
famous 11, favorite 10
exhibition special 20, international 17, islamic 16, mediterranean 15, large 13, cool 12, french 10, great
10, japanese 9, modern 9, ancient 8, beautiful 8
masterpiece worthy 190, crazy 14, neoclassical 12, favorite 11, specific 10, various 8
Table 7.12: (Attribute, Value) list of Grand Canyon
Attribute Value
grand canyon national 1507, green 236, great 176, natural 154, beautiful 134, incentive 112, best 74, com-
mon 49, cool 37, big 37, good 33, large 30, huge 28, original 25, special 24, american 21,
gorgeous 20, largest 20, majestic 19, famous 17, vertical 16, wonderful 16, giant 16, epic 15,
bad 14, nice 14, fabulous 11, incredible 11, spectacular 10, top 10, catastrophic 9, celestial 8,
dangerous 8, quiet 8, rocky 8, worth 8, glorious 7, scary 6, breathtaking 6, scenic 5
fog heavy 1622, massive 16, breathtaking 8, great 6, rare 6
valley large 84, beautiful 35, national 9, majestic 6, best 6, big 5, good 5
trip spontaneous 15, best 14, great 13, recent 11, good 10, important 10, memorable 9, national 8
view nice 45, beautiful 13, spectacular 12, great 11, best 10, common 10, google 8, new 7, rustic 7,
56
gorgeous 6, panoramic 6, bad 6, breathtaking 6
phenomenon rare 271, atmospheric 34, natural 20, beautiful 17, great 11, gorgeous 10, mysterious 7, special
6
place beautiful 25, exotic 23, great 20, best 15, good 14, public 13, dangerous 13, peaceful 12, nice
10, wonderful 10, exceptional 8, magical 7
heritage hidden 33, best 31, natural 24, majestic 15, world 14, mutual 13, visible 11, cultural 8, dan-
gerous 6, national 6
Table 7.13: (Attribute, Value) list of Machu Picchu
Attribute Value
machu picchu best 837, historical 168, old 167, beautiful 103, ancient 65, top 46, important 38, incredible 32,
historic 31, centenary 27, great 26, cultural 25, botanical 22, famous 18, unforgettable 17, wild
17, good 15, favorite 12, wonderful 11, majestic 10, gorgeous 9, impressive 9, mysterious 7,
national 7, nice 6, popular 6, unique 6, agricultural 5, fantastic 5, healthy 5, indigenous 5,
interesting 5
travel best 65, expensive 9, extraordinary 7, important 7, cultural 6
city mysterious 185, lost 63, ancient 48, familiar 19, legendary 11, good 10, enigmatic 9, indige-
nous 7, large 6, beautiful 5
place best 953, historical 824, worldwide 156, mysterious 26, historic 17, beautiful 13, magical 11,
fantastic 11, wonderful 6
guide shamanic 178, andean 162, famous 72, spiritual 56, famed 46
heritage important 21, historic 16, natural 10, cultural 8, famous 5, popular 5
stairs vertical 191, dangerous 18, various 8
ruins major 19, full 18, historic 15, ancient 12, famous 10, fantastic 8, incredible 7
Table 7.14: (Attribute, Value) list of Angkor Wat
Attribute Value
angkorwat popular 602, famous 168, international 79, ancient 43, beautiful 46 largest 38, great 31, good
24, cultural 22, mysterious 22, nice 19, various 18, spiritual 16, rustic 15, religious 12, arche-
ological 10, best 9, cambodian 9, spiritual 8, panoramic 7, impressive 7, top 6, traditional 6
temple largest 48, ancient 29, imperial 18, golden 15, big 13, huge 11, buddhist 8, beautiful 7, gor-
geous 5, religious 5
heritage world 38 cultural 26, famous 8, fashionable 8, great 7, popular 6, bad 6, beautiful 5
travel overseas 27, domestic 14, popular 12, national 10, cultural 6
time private 78, spacious 68, long 16, tropical 12, limited 10, peaceful 9, closing 8, wonderful 6
ruins famous 16, particular 14, mysterious 13, great 10, spiritual 11, huge 10, ancient 8, good 5
city ancient 23, exotic 20, imperial 17, historic 8, magical 5, beautiful 5
place good 13, best 12, great 10, beautiful 6, religious 6, unbelievable 6
people local 116, architectural 20, reminiscent 18, rustic 13, special 10, materialistic 8, shy 6
Table 7.15: (Attribute, Value) list of Eiffel Tower
Attribute Value
eiffeltower beautiful 751, new 621, tallest 423, lighter 221, cute 165, wonderful 152, high 132, big 130,
good 117, top 106, romantic 99, famous 98, great 84, best 72, different 65, glamorous 63,
global 48, colorful 46, nice 45, visible 44, electric 43, gorgeous 43, large 41, old 37, bad 36,
highest 35, cool 33, original 33, french 27, perfect 26, giant 25, huge 25, favorite 23, fashion-
able 22, magical 22, special 22, spectacular 18, iconic 15, gorgeous 14, bright 14, incredible
13, fantastic 12, hilarious 11, majestic 10, magnificent 10
paris romantic 3122, beautiful 401, wonderful 174, top 49, best 45, famous 32, good 28, big 17,
french 15, large 10
france various 457, beautiful 193, romantic 35, honeymoon 14, good 10
57
view spectacular 880, beautiful 792, amazing 573, different 78, top 48, great 19, artistic 13, won-
derful 12, incredible 12, google 10, good 8, nice 6, magnificent 5, fantastic 5, wonderful 5
picture beautiful 1296, wallpaper 36, good 19, phenomenal 11, wonderful 7, amazing 6
place Popular1733, romantic 68, exotic 63, favorite 43, happy 30, beautiful 27, best 10, famous 8,
good 6, special 5
photo wonderful 382, rare 234, instant 44, good 29, glamorous 24, beautiful22, gorgeous 17, special
11, best 10, beautiful 8, great 7
fireworks beautiful 20, happy 12, romantic 12, special 10
Table 7.16: (Attribute, Value) list of Taj Mahal
Attribute Value
tajmahal beautiful 127, red 108, famous 70, private 51, magnificent 36, great 23, prestigious 23, big 22,
good 20, golden 19, classic 18, iconic 17, magical 17, modern 17, open 15, royal 15, expen-
sive 14, indian 12, majestic 12, special 12, greatest 9, incredible 8, peaceful 8, architectural 7,
authentic 6, best 6, favorite 5, gorgeous 5, marble 5, symmetrical 5
india cultural 34, incredible 32, good 27, ancient 21, magical 15, greatest 13, awesome 10, delicious
8, famous 6, historical 6
story true 28, beautiful 23, sad 13, greatest 11, new 8, eternal 5
tour private 34, magical 12, perfect 12, special 10, indian 6, incredible 5
place divine 29, beautiful 14, best 10, favorite 8, great 5, terrible 5
building huge 25, marble 22, white 14, beautiful 10, impressive 8, funeral 6, original 6
city
atlantic 73, famous 13, indian 10, blue 8, authentic 6, beautiful 5, industrial 5, interesting 5,
romantic 5, expensive 4
architecture islamic 16, mughal 12, beautiful 8, marvellous 7, great 6, persian 5, historical 4
The above (Attribute, Value) lists contain plentiful information of the characteristics of the target
attractions. From these lists, it is relatively easy to figure out the specific reasons to explain people’s
opinions for the attractions.
7.3.3 Culture-based Analysis
Then, k-means method is used to cluster the 50 countries into several groups, according to their
sentiment scores over the 12 attractions. Here, k is empirically set as 4~10, and figure 7.38~7.44
show the world maps presenting the corresponding clustering results. Countries filled with the same
color are from the same cluster.
58
Figure 7.38: Clustering Result Map (k=4)
Figure 7.39: Clustering Result Map (k=5)
Figure 7.40: Clustering Result Map (k=6)
59
Figure 7.41: Clustering Result Map (k=7)
Figure 7.42: Clustering Result Map (k=8)
Figure 7.43: Clustering Result Map (k=9)
60
Figure 7.44: Clustering Result Map (k=10)
Based on these clustering result maps, the following information can be obtained:
a) While k is set as 4, EG, SA, and KW form a group, which proves both the location-based and
language-based cultural effects. Moreover, most European countries are clustered into the
same group, which also suggests the existence of location-based cultural effects;
b) While k is set as 5, the European countries are further attributed to two groups, which may
roughly be divided by the boundary of Western and Eastern Europe. This phenomenon can
also in a way demonstrate the location-based cultural effects;
c) While k is set as 6, RU, KZ, and PL become a separate group, which indicates the loca-
tion-based cultural effect.
d) While k is set as 7, AR, PA, CO, and EC form a new group, manifesting the location-based
and language-based cultural effects;
e) While k is set as 8, the two English-speaking Southeast countries—MY and PH, form a sepa-
rate cluster, which also suggests the location-based and language-based cultural effects;
f) While k is set as 9 and 10, the clusters are further subdivided into smaller groups, but not so
much special information can be gained.
While only focusing on the k=8 situation, the clustering result is:
US, GB, AU, CA, SG, BR, JP, NL, GR, CN, KR, VE, CL, IE, SE;
RU, PL, KZ;
MY, PH;
ID, MX,TH, VN, KH;
DE, SK, SI, EE, LV, BG;
IN, ZA FR, ES, IT, NO, GL, DK, CZ, CH, BE, AT;
AR, PA, CO, EC;
EG, KW, SA.
Based on the above clustering result, several conclusions can be drawn.
a) The location-based cultural effects on the user evaluations for world attractions are obvious
for some countries, such as the group of MY and PH, the group of EG, KW, and SA, and
neighboring countries in Europe, North America, and Southeast Asia;
b) The language-based cultural effects also exist that most typical English-speaking countries are
61
in the same cluster, including US, GB, AU, and CA;
c) While considering the opinions towards world attractions, the boundary between North
America and South America is blurring, especially compared to the clustering result in res-
taurant domain.
d) It seems a little confusing that the three main East Asian countries, i.e. JP, CN, KR, are all
clustered into the same group with American countries. To explain this result, some underly-
ing facts should further be excavated.
7.3.4 Comparison of the Two Domains
Finally, by comparing the clustering results of the two domains, it can be concluded that
a) The cultural effects on user evaluations for different domains are not the same. While some
countries constantly belong to the same cluster, the groupings for other countries do not re-
main the same;
b) The disparity of attitudes between North America and South America seem to be more signif-
icant towards food than towards traveling;
c) From the results of both fields, it seems that the Asian countries have rather varied cultural
backgrounds in comparison with other areas;
d) As for European countries, despite the relatively small territorial area, people’s evaluations
vary from culture to culture, which may be attributed to the diversity of languages in these
countries;
e) The main English-speaking countries seem to share more similarities in cultural effects on
user evaluations regardless of the fields, which may indicate the language-based cultural ef-
fects;
f) For both domains, the location-based cultural effects are quite obvious, which means that
countries with close geographical positions tend to hold similar attitudes towards restaurants
or attractions;
g) The limitations of the experiments may have led to some unexpected results. For example, in
both cases, CN has been grouped with Western countries, which may partly due to the block
of Twitter in China mainland.
62
8. Conclusion
In this research, the relationship between user evaluations and cultural backgrounds for the res-
taurant domain was first investigated. This investigation was based on more than 30 countries
around the world, and tweets written in more than 30 languages were analyzed. The main steps
included data preprocessing, spam filtering, subjectivity classification, polarity classification, and
a series of analysis. Three key classifiers (i.e. spam classifier, subjectivity classifier, and polarity
classifier) were trained with a range of different implementations, and they achieved the accuracy
of 97.8%, 78.4%, and 91.1% respectively. The later steps of statistical analysis, basic sentiment
analysis, and culture-based analysis brought us instructive results considering the cultural effects on
user evaluations for restaurants.
Then, the same approach of sentiment analysis was applied to the tourism domain to prove the
transferability of the proposed methods. The three main classifiers achieved the accuracy of 92.5%,
84.3%, and 96.4%, and by applying these sequential classifiers, a series of sentiment analysis were
carried out for the tourism domain, and informative results were obtained.
Through these cross-domain investigations of user evaluations, conclusion can be reached that
the cultural effects on user evaluations for both restaurant domain and tourism domain actually
exist, and are quite obvious for some countries and cultural backgrounds. The proposed approach
has also been proved to be capable of cross-lingual sentiment analysis, and is transferable to other
fields.
As the next steps, first, other latent elements besides the cultural background should be further
investigated, so as to figure out the underlined facts that can explain for some unexpected results
of certain countries. Then, other possible expansions, including the expansion to other domains,
should be further considered.
63
Acknowledgements
Upon the completion of this thesis, I would extend my heartfelt gratitude to a number of people.
First, my faithful gratitude should go to Prof. Yamana, my supervisor, who has taught and sup-
ported me so much throughout the 2-year graduate life. Owning to his insightful guidance and
comment on my research as well as patient revising, this thesis has eventually come to fruition.
My sincere acknowledgement also goes to all the professors and teachers, who have ever taught
me during the master course. It is precisely because of their careful and responsible teaching that I
can lay a solid foundation for my study and research.
I would also express my cordial thanks to the students in Yamana laboratory, for their valuable
advices and enthusiastic help, either in the academic aspect or in the life aspect.
Finally, I give my heartiest gratitude to Ting Hsin Group and Waseda University, for their gen-
erous and constant support for my study life in Japan. The full-scholarship master program pro-
vides me with the precious opportunities to acquire advanced knowledge, to broaden my horizons
and minds, and to achieve my ambitions and visions.
64
References
[1] Liu, B. (2010). Sentiment analysis and subjectivity. Handbook of natural language processing,
2, 627-666.
[2] Agarwal, A., Xie, B., Vovsha, I., Rambow, O., & Passonneau, R. (2011, June). Sentiment
analysis of twitter data. In Proceedings of the Workshop on Languages in Social Media (pp.
30-38). Association for Computational Linguistics.
[3] Brody, S., & Diakopoulos, N. (2011, July). Cooooooooooooooollllllllllllll!!!!!!!!!!!!!!: using
word lengthening to detect sentiment in microblogs. In Proceedings of the Conference on Empiri-
cal Methods in Natural Language Processing (pp. 562-570). Association for Computational Lin-
guistics.
[4] Pak, A., & Paroubek, P. (2010, May). Twitter as a Corpus for Sentiment Analysis and Opinion
Mining. In LREC.
[5] Meng, X., Wei, F., Liu, X., Zhou, M., Li, S., & Wang, H. (2012, August). Entity-centric top-
ic-oriented opinion summarization in twitter. In Proceedings of the 18th ACM SIGKDD interna-
tional conference on Knowledge discovery and data mining (pp. 379-387). ACM.
[6] Wong, F. M. F., Sen, S., & Chiang, M. (2012, August). Why watching movie tweets won't tell
the whole story?. In Proceedings of the 2012 ACM workshop on Workshop on online social net-
works (pp. 61-66). ACM.
[7] Guo, H., Zhu, H., Guo, Z., Zhang, X., & Su, Z. (2010, October). OpinionIt: a text mining sys-
tem for cross-lingual opinion analysis. In Proceedings of the 19th ACM international conference
on Information and knowledge management (pp. 1199-1208). ACM.
[8] Bautin, M., Vijayarenu, L., & Skiena, S. (2008, April). International Sentiment Analysis for
News and Blogs. In ICWSM.
[9] Nakasaki, H., Kawaba, M., Utsuro, T., & Fukuhara, T. (2009). Mining
cross-lingual/cross-cultural differences in concerns and opinions in blogs. In Computer Processing
of Oriental Languages. Language Technology for the Knowledge-based Economy (pp. 213-224).
Springer Berlin Heidelberg.
[10] Tumasjan, A., Sprenger, T. O., Sandner, P. G., & Welpe, I. M. (2010). Predicting Elections
with Twitter: What 140 Characters Reveal about Political Sentiment. ICWSM, 10, 178-185.
[11] Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and trends
in information retrieval, 2(1-2), 1-135.
[12] Go, A., Bhayani, R., & Huang, L. (2009). Twitter sentiment classification using distant su-
pervision. CS224N Project Report, Stanford, 1-12.
[13] Barbosa, L., & Feng, J. (2010, August). Robust sentiment detection on twitter from biased
and noisy data. In Proceedings of the 23rd International Conference on Computational Linguistics:
Posters (pp. 36-44). Association for Computational Linguistics.
[14] Kouloumpis, E., Wilson, T., & Moore, J. (2011). Twitter sentiment analysis: The good the
bad and the omg!. ICWSM, 11, 538-541.
[15] Saif, H., He, Y., & Alani, H. (2012). Semantic sentiment analysis of twitter. In The Semantic
Web–ISWC 2012 (pp. 508-524). Springer Berlin Heidelberg.
[16] Hu, X., Tang, L., Tang, J., & Liu, H. (2013, February). Exploiting social relations for senti-
ment analysis in microblogging. In Proceedings of the sixth ACM international conference on
Web search and data mining (pp. 537-546). ACM.
65
[17] Cesarano, C., Picariello, A., Recupero, D. R., & Subrahmanian, V. S. (2007). The OASYS
2.0 Opinion Analysis System. ICWSM, 7, 313-314.
[18] Abbasi, A., Chen, H., & Salem, A. (2008). Sentiment analysis in multiple languages: Feature
selection for opinion classification in Web forums. ACM Transactions on Information Systems
(TOIS), 26(3), 12.
[19] Cui, A., Zhang, M., Liu, Y., & Ma, S. (2011). Emotion tokens: Bridging the gap among mul-
tilingual twitter sentiment analysis. In Information retrieval technology (pp. 238-249). Springer
Berlin Heidelberg.
[20] Gao, Q., Abel, F., Houben, G. J., & Yu, Y. (2012). A comparative study of users’ microblog-
ging behavior on Sina Weibo and Twitter. In User modeling, adaptation, and personalization (pp.
88-101). Springer Berlin Heidelberg.
[21] Hardoon, D., Szedmak, S., & Shawe-Taylor, J. (2004). Canonical correlation analysis: An
overview with application to learning methods. Neural computation, 16(12), 2639-2664.
[22] Chaudhuri, K., Kakade, S. M., Livescu, K., & Sridharan, K. (2009, June). Multi-view clus-
tering via canonical correlation analysis. In Proceedings of the 26th annual international confer-
ence on machine learning (pp. 129-136). ACM.
[23] Faridani, S., Bitton, E., Ryokai, K., & Goldberg, K. (2010, April). Opinion space: a scalable
tool for browsing online comments. In Proceedings of the SIGCHI Conference on Human Factors
in Computing Systems (pp. 1175-1184). ACM.
[24] Brody, S., & Elhadad, N. (2010, June). An unsupervised aspect-sentiment model for online
reviews. In Human Language Technologies: The 2010 Annual Conference of the North American
Chapter of the Association for Computational Linguistics (pp. 804-812). Association for Compu-
tational Linguistics.
66
Publications
Published:
Le, J., & Yamana, H. (2013, November). A comparative study of user evaluations of global res-
taurants under multi-cultural backgrounds. In WebDB Forum.
Le, J., & Yamana, H. (2014, March). Cross-lingual investigations of user evaluations for global
restaurants. In DEIM 2014 (B4).
To be published:
Le, J., & Yamana, H. (2014, August). Cross-domain investigations of user evaluations under the
multi-cultural backgrounds. In the 159th
DBS workshop.
Le, J., & Yamana, H. (2014, September). Cross-cultural investigations of user evaluations for mul-
tiple domains: using Twitter data. In SICSS 2014.