23
Online Feedback Correlation using Clustering Research Work Done for CS 651: Internet Algorithms

Online feedback correlation using clustering

Embed Size (px)

DESCRIPTION

My presentation given for Internet search class. I theorized that you could determine how good a product was based on the different types of negative reviews automatically

Citation preview

Page 1: Online feedback correlation using clustering

Online Feedback Correlation using Clustering

Research Work Done for CS 651: Internet Algorithms

Page 2: Online feedback correlation using clustering

Dedicated to Tibor Horvath

Whose endless pursuit of getting a PhD (imagine that) kept him from researching this topic.

Page 3: Online feedback correlation using clustering

Problem Statement

Millions+ of reviews available Consumers read only a small number of reviews. Reviewer content not always trustworthy

Page 4: Online feedback correlation using clustering

Problem Statement (continued)

What information from reviews is important? What can we extract from the overall set of reviews

efficiently to provide more utility to consumers than is already provided?

Page 5: Online feedback correlation using clustering

Motivation

People are increasingly relying on online feedback mechanisms in making choices [Guernsey 2000]

Online feedback mechanisms draw consumers Competitive Edge Quality currently bad

Page 6: Online feedback correlation using clustering

Current Solutions

“Good” review placement Show small number of reviews

. . . more Trustworthy?

Page 7: Online feedback correlation using clustering

Amazon Example

Page 8: Online feedback correlation using clustering

Observations

Consumers look at a product based on its overall rating

Consumers read “editorial review” for content Reviews indicate can indicate common issues

… Can we correlate these reviews in some meaningful way?

Page 9: Online feedback correlation using clustering

Observations Lead to Hypotheses!

Hypothesis: Products with numerous similar negative reviews will often not be purchased regardless of their positive reviews. Furthermore, the number of negative reviews is a high indication of the likeliness of certain flaws in a product.

Page 10: Online feedback correlation using clustering

Definitions

Semantic Orientation: polar classification of whether something is positive or negative

Natural Language Processing: deciphering parts of speech from free text

Feature: quality of a product that customers care about

Feature Vector: vector representing a review in a d-dimensional space where each dimension represents a feature.

Page 11: Online feedback correlation using clustering

Overview of Project

Obtain large repository of customer reviews Extract features from customer reviews and orient

them Create feature vectors i.e. [1,0,-1,1,1,-1 … ] from

reviews and features Cluster feature vectors to find large negative clusters Analyze clusters and compare to hypothesis

Page 12: Online feedback correlation using clustering

Related Work

Related work has fallen into one of three disparate camps

1. Classification: classifying Reviews into Negative or Positive reviews

2. Domain Specificity: overall effect of reviews in a domain

3. Summarization: features extraction to summarize reviews

Page 13: Online feedback correlation using clustering

Limitations of Related Work

Classification– Overly summarizing

Domain Specificity– Hard to generalize given domain information

Summarization– No overall knowledge of collection

Page 14: Online feedback correlation using clustering

Close to Summarization?

Most closely related to work done in Summarization by Hu and Liu.– Summarization with dynamical feature extraction and

orientation per review

Page 15: Online feedback correlation using clustering

Data for Project

Data from Amazon.com customer reviews – Available through the use of Amazon E-Commerce

Service (ECS)– Four thousand products related to mp3 players– Over twenty thousand customer reviews

Page 16: Online feedback correlation using clustering

Technologies Used

Java to program modules Amazon ECS NLProcessor (trial version) from Infogistics Princeton’s WordNet as a thesaurus KMLocal from David Mount’s group at University of

Maryland for clustering

Page 17: Online feedback correlation using clustering

Project Structure

Page 18: Online feedback correlation using clustering

Simplifications Made

Limited data set Feature list created a priori Features from same sentence given same

orientation Sentences without features neglected Number of clusters chosen only to see correlations in

biggest cluster Small adjective seed set

Page 19: Online feedback correlation using clustering

Analysis

Associated Clusters with Products Found negative clusters using threshold (-0.1) Eliminated non-Negative Clusters Sorted products list twice

– Products by sales rank (given by Amazon)– Products sorted by hypothesis with tweak

Tweak: Relative Size * Distortion Computed Spearman’s Distance

Page 20: Online feedback correlation using clustering

Results

Hypothesis calculates with 82% accuracy! But most of the four thousand products were pruned

due to poor orientation

Page 21: Online feedback correlation using clustering

Conclusion

Consumers are affected by negative reviews that correlate to show similar flaws.

Affected regardless of the positive reviews

Page 22: Online feedback correlation using clustering

Future Work

Larger seed set for adjectives Use more complicated NLP techniques Experiment with the size of clusters Dynamically determine features using summary

techniques Use different data sets Use different distance measure in clustering

Page 23: Online feedback correlation using clustering

Questions