View
28
Download
1
Category
Tags:
Preview:
Citation preview
ENTER 2015 Research Track Slide Number 1
Analyzing User Reviews in Tourism with Topic Models
Marco Rossetti, Fabio Stella, Longbin Cao and
Markus Zanker*
Alpen-Adria-Universität Klagenfurt, Austriamzanker@acm.org
http://www.aau.at
* The presenter acknowledges the financial support of the European Union (EU), the European Regional Development Fund (ERDF), the Austrian Federal Government andthe State of Carinthia in the Interreg IV Italien-Österreich programme (project acronym O-STAR).
ENTER 2015 Research Track Slide Number 2
Agenda
• Motivation• Topic Models• Application scenarios• Results• Conclusions
ENTER 2015 Research Track Slide Number 3
Motivation• Evergrowing vast amounts of data
– ~200 mio. reviews on Tripadvisor– Valuable opinion source
• Need for automated processing of data harvested from the Web.
• Two principal (research) directions– Machine Learning (ML): fitting general purpose statistical models to data– Semantic Web: goal to move from the traditional „unstructured“ Web to a web of
data (annotate data with semantic descriptors and efficient reasoning mechanisms)
• Topic Model is within the ML direction, but it promises to detect semantic ties between words
ENTER 2015 Research Track Slide Number 4
Topic Model 1/3• Method to organize, search and summarize electronic
documents
• „..algorithms for discovering the themes that pervade a large and otherwise unstructured collection of documents.“ [Blei, CACM, 2012]
• Unsupervised learning strategy that builds on the basic idea:– Big corpus of documents such as reviews– Uncover hidden topical patterns– Annotate documents according to those topics
ENTER 2015 Research Track Slide Number 5
Topic Model 2/3• Topic: coherent and meaningful bag of words• Words: can be related to several topics
(homonyms)• Documents: can be about several topics
• Example: documents can be about cats and dogs:– Kitten, cat, meow..– Dog, bone,…
ENTER 2015 Research Track Slide Number 6
Topic Model 3/3• Intuition: Topics are probability distributions over
words and this discrete distribution generates observations (words in documents).
• Computation task: Compute the topic structure given the observations (Posterior).– Approximation of .. – .. distribution over words for each topic– .. topic proportion for each document– .. topic assignment to each occurence of a word in a
document
ENTER 2015 Research Track Slide Number 7
Example
Topic“Location”
Topic“Food”
Topic“Rooms”
walking_distance breakfast Showerstation service bathroom
city_centre Restaurant mattressmetro Bar roomclose Food tv
“The hotel was right in the centre of the city, at walking distance from the city centre! Huge breakfast with nice food!”
“I stayed in this hotel with my friends, the room was cheap, but the shower was broken and the mattress was very hard!”
“The room was nice, with a flat tv, but the breakfast was so poor! I didn’t have enough food.”
Room
Food
Location
ENTER 2015 Research Track Slide Number 8
Goal and Contributions
1. Explore opportunities for application of the Topic Model* method in the Tourism domain.
2. Provide empirical evidence for their utility.
* Note that it is a family of many different methods.
ENTER 2015 Research Track Slide Number 9
Scenario 1: Item recommendation
• Users write reviews about topics that they care about (preference)
• Textual reviews associated to an overall rating explain what aspects of the item were particularly assessed
“The hotel was right in the centre of the city, at walking distance from the city centre! Huge breakfast with nice food!”
ENTER 2015 Research Track Slide Number 10
Topic-Criteria model 1/3
• User profiles (UP) created from topic distributions in own reviews𝑈�ሺ�, �ሺ= σ �൫�ห���൯��� ∈��|�� |
ENTER 2015 Research Track Slide Number 11
Topic-Criteria model 2/3
• Item profiles created from reviews and ratings
𝐼�ሺ�, �ሺ= σ �൫�ห�𝑖�൯∙ �𝑖��𝑖� ∈��σ �൫�ห�𝑖�൯�𝑖� ∈��
ENTER 2015 Research Track Slide Number 12
Topic-Criteria model 3/3
• Prediction based on the sum of products for all topics– Weight parameter fitted to data– Assumption that not all topics are equally influential
�Ƹ𝑖� = ሺ 𝑈�ሺ𝑖, �ሺ∙ 𝐼�ሺ�, �ሺ∙ ����=1
ENTER 2015 Research Track Slide Number 13
Results for Scenario 1
YELP-5-5 YELP-10-10 TA-3-3 TA-5-5KNN-IB 1,0709 1,0249 1,0531 0,9601KNN-UB 1,1088 1,0424 1,0715 0,9447PMF 1,0956 1,0389 1,0373 0,9946TC 1,0706 1,0247 1,0625 0,9719TC-W 1,0599 0,9955 1,0916 0,9776
• Evaluation on datasets from YELP (restaurants) and Tripadvisor (hotels) with different levels of sparsity
• Accuracy results (RMSE) of Topic-Criteria model comparable to Nearest-Neighbor and Matrix Factorization approaches, BUT richer user profiles and we could explain which topics have been considered in real user interaction!
ENTER 2015 Research Track Slide Number 14
Scenario 2: Analytics
• Anecdotal evidence on what topics might explain a good or bad rating for a service provider or a destination.
• BUT: risk of fallacies due to e.g. cherry-picking.
Cleanliness in reviews on Orlando hotels
Business in reviews on New York hotels
dirty mold bugs smelled smell filthy carpet musty stained disgusting bed_bugs black mildew moldy stains bites dust musty_smell refund
internet free free_internet access wireless internet_access wireless_internet business_center computers free_wireless business boarding gym center print free_internet_access printer bottled passes
ENTER 2015 Research Track Slide Number 15
Scenario 3: Automated Interpretation of reviews
• Automatically derive different properties from a review such as:– Rating value: extract topics from the written text and match with
them with the item profile – if users writes about strengths of the hotel high score
– Identify reviews where the associated rating value is / is not coherent with the predicted rating to identify fake reviews or rank more plausible reviews higher
– Identify reviews with more breath / broader scope (see Daniel Leung‘s thesis)
ENTER 2015 Research Track Slide Number 16
Conclusions
• Several application scenarios for the Topic Model method in the tourism domain identified
• Empirical evidence that proposed Topic-Criteria model achieves comparable or better results than baseline recommendation methods
• Future work:– Different extensions of Topic Model methods employing supervised
learning– Contrasting derived topic distributions with real user assessments
ENTER 2015 Research Track Slide Number 17
Thank you for your attention!
Questions?
Questions?Questions?
Markus ZankerIntelligent Systems and Business Informatics
Alpen-Adria-Universität Klagenfurt, Austria
M: mzanker@acm.org
P: +43 463 2700 3753
Skype: markuszanker
W: http://www.isbi.at/mzanker
Visit: http://www.recommenderbook.net
ENTER 2015 Research Track Slide Number 18
Project OSTAR• Development of an innovative online system for
recommending individual tours and trails in alpine regions– Research partners:
• EURAC research, Bolzano, Italy• Free University Bolzano-Bozen, Italy• Autonomous Province of Bolzano – South Tyrol (Dept. for spatial and
statistical informatics)• Alpen-Adria-Universität Klagenfurt
– Application partners:• Tourism regions in Carinthia and South Tyrol
– Runtime: 2012-2014– Programme:
• Interreg IV Italy-Austria
Recommended