Upload
papanaboinasuman
View
48
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
Summariza(on and Opinion Detec(on In Product Reviews
Team :
Suman Papanaboina ([email protected]) Swapnil Pa7l ([email protected])
Shubham Srivastava ([email protected]) Spandana Otra ([email protected])
Project Mentor:
Aditya Joshi ([email protected])
Project Mo7va7on
• As e-‐commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly.
• For a popular product, the number of reviews can be in hundreds or even
Project Mo7va7on This makes it difficult for a poten7al customer to read them to make an informed decision on whether to purchase the product. It also makes it difficult for the manufacturer of the product to keep track and to manage customer opinions .
Project Objec7ve
• Providing Structured feature based summary for the new customer by mining reviews.
How it is different from Tradi7onal Summariza7on?
• We only mine the features of the product on which the customers have expressed their opinions and whether the opinions are posi7ve or nega7ve.
• We do not summarize the reviews by selec7ng a subset or rewrite some of the original sentences from the reviews to capture the main points as in the classic text summariza7on.
End-‐to-‐End Architecture Crawler
UI
Rest Service
Sentence SpliTer/Preprocesser
Feature/Opinion Extractor
Frequent Feature Iden7fier
Feature Pruner
Sen7ment Analyzer Persistence
Summarizer
MySQl
Crawler Module
Flipkart
Jsoup Scraping Tool Persister
MySQL
Crawled below informa7on Product Name Ra7ng Review Comment Commented User Commented Date/Time
Sentence SpliTer/Preprocessor
Review
Sentence SpliTer
OpenNLP
MySQL Persister
Sentence
Preprocessor
Stop words filter
Stemming
Feature/Opinion Extractor Module
Sentence
Stanford Dependency
Parser
Extract nusbj, amod, nn
Find any nega7ons Persister
MySQL
Feature/Opinion Extractor Module
• Used stanford dependency parser • Extract only nsubj, amod, nn pairs. These pairs turns out to be the required feature/opinion pairs.
• Iden7fy any nega7ons expressed and adjust the opinion accordingly.
Frequent Feature Iden7fica7on
• We defined frequent feature as a feature which appears in more than 3 sentences (this parameter can be configured).
• We used Apache Mahout library to find frequent paTerns.
Frequent Feature Iden7fica7on
Features
Mahout Frequent PaTern Miner
Sentences
FP-‐Grwoth/Fp-‐tree
Frequent Features Persister
MySQL
Redundancy Pruning
• We defined a feature X as redundant feature if • X is a part of another feature • And the feature X does not appear on its own at least in 3 sentences (threshold is configurable, currently in our system we configured it as 3)
• A_er implemen7ng this technique we are able to eliminate redundant features like baTery, life, baTery life.
Redundancy Pruning
Redundancy Pruner
BaTery, life, baTer life
BaTery life
Junk Features
• Some of the reviews we have sentences like Flipkart services are awesome in this case our system is extrac7ng service as feature and awesome as opinion.
Frequent Features
Junk Feature Pruner
Junk Feature File
Output Featues
Sen7ment Analysis
Opinion Words
Sen7ment Analyzer Sen7Wordnet
Posi7ve Seed List Nega7ve Seed List
Summarizer
• Summarizer generated feature based structured summary as shown below.
Feature Summary Rest Service
• We implemented Rest service to provide following func7onali7es to the UI.
– Find List of categories in the system – Find list of products for a given category – Find feature based summary for a given product
• We used Grizzly embedded container to implement rest service.
UI
Screen Shots/Home Page
Screen Shots/Feature based summary
Screenshots/Individual sentences
Screenshots/Complete review
Evalua7on
No. of feature-‐opinion pairs manual extracted 20
No. of ini7al feature-‐opinion pairs extracted by our system
40
A_er frequent paTern mining 25
A_er pruning (final stage) 18
No. of correct feature-‐opinion pairs 15
No. of incorrect feature-‐opinion pairs 3
Precision 15/20 (75%)
Recall 18/20 (90%)
F1-‐Measure ( 2*precision*recall)/(precision+recall) 0.81
Conclusion
• It is a great learning experience for all of us. we are really excited in applying data mining and natural processing techniques to implement the system.
• We do believe that this system can help users to quickly iden7fy what is good/bad in a product basing on other user comments. It also provides a beTer perspec7ve of user’s comments to the Manufacturers which can aid in proving business intelligence.
Future Enhancements • We need to add more rules to improve overall accuracy of
the feature/opinion iden7fica7on. • Migrate en7re system to run on Hadoop YARN using Hbase
instead of Mysql. • Try unsupervised/supervised machine learning approaches
for feature/opinion iden7fica7ons. • Replace our home grown Crawler with more robust and
opensource crawler Apache Nutch (hTps://nutch.apache.org/)