Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Hybrid Sentiment Analysis- A Survey of Different
Approaches and Techniques
Jayashree Jagdale
PhD Scholar, Computer Engineering
Pacific University (PAHER), India
Dr. Emmanuel M.
Professor, Information Technology
Pune Institute of Computer Technology
Abstract—Sentiment Analysis has been attracting the
researchers since a while. Studies span the various methods
applied to extract opinions (called sentiments or even emotions),
data resource building or cross domain classification. Capturing
public opinion about social events, political movements, company
strategies, marketing campaigns, and product preferences is
garnering increasing interest from the scientific community (for
the exciting open challenges), and from the business world. This
paper presents a survey of different ways in which researchers
have used employed hybrid models for sentiment analysis using
lexicon and machine learning techniques. The techniques are
compared and reviewed to understand the applications and have
a roadmap of hybrid sentiment analysis.
Keywords—Sentiment Analysis, Lexicon Machine Learning,
Neural Network, Human Behaviour.
I. INTRODUCTION
Sentiment Analysis - rapidly growing area that deals with
automatically extracting peoples’ opinions, emotions or attitudes from unstructured data, and hence understanding the sentiments. It is essentially a boon from Information Retrieval.
It is also rightly called opinion mining or subjective analysis.
These opinions are a much needed fodder for people centric
market, forming authentic feedback for businesses about their
products and services. It provides them with many
opportunities and insight to deal with market competition.
Researchers have been using data from social networking
sites, blog spots, chats, news, review websites and huge list of
resources for sentiment analysis of. During conversations,
knowingly / unknowingly, people express their opinions
through comments, discussion forums. Many applications could benefit including entertainment and product review
mining, product reputation analysis, spam filtering and
tracking sentiments toward events. Intelligent systems have
been built to extract facts associated with terrorist incidents,
disease outbreaks, plane crashes, vehicle launches,
management succession, joint ventures, corporate acquisitions,
and job and seminar announcements and to predict stock
market scenarios, environmental conditions, even business
closures and many more. Sentiment analysis can be done at a
word level, sentence level or on a document as a whole.
In document level, the sentiments in the entire document are
aggregated. The assumption is that the entire document has
discussion on same feature. In sentence-level each sentence is
analyzed to extract sentiments in whole sentence. Subjective
sentences are classified as positive or negative; in aspect-level,
the sentiment for the specific aspects of entities is studied [1].
II. RELATED WORK IN SEMTIMENT ANALYSIS
In the first case the study of presence of words expressing
positive or negative sentiments are checked for making use of
data repositories like SentiWordNet where dictionaries are
marked with scores pos, neg, obj (for positive, negative and
objective resp.), SentiFul etc. In an information retrieval and
area of natural language processing, set of words model, a
practice of Vector Space Modelling, is a method where given
unstructured text data which may be a sentence, paragraph or a
document is converted to a structured form that contains vector of words and their relationship with the documents,
disregarding grammar and even word order. In a later stage,
the set of n-grams, and set of character n-grams feature
selection methods were conceived and it was observed that
they lead to the best results. Authors [1], defined WordNet
utilizing words from distinctive dictionary as sources for
sentiment carrying base constituents and relating the patterns
of compound establishments. If adjectives set is applied with
predetermined alignment tags such as positive or negative and
if pairs of adjectives adjoined with conjunctions such as “or”, “and,” “either-or,” “but,” “neither-nor,” it is promising to forecast the orientation of two adjoined adjectives like “A beautiful and fresh fruit”, “A good script but poor dialogues”[2].
A method based on semi supervised minimum cut algorithms and distributional similarity is used to allocate the subjectivity tags for word senses. An idea would be to augment this polarity information to adjectives in the WordNet. Two anchor words (extremes of the polarity spectrum) were chosen. PMI of adjectives with respect to these adjectives is calculated as Polarity Score (W)= PMI(W, excellent) – PMI (W, poor). Intensity of the sentiment is also considered by researchers. So when I say “the movie was nice” and you say “movie was awesome”, your sentiments carry a higher positive value. A number of approaches exist to study underlying affective state or grammatical options: Semantic Trees, Key_Word Spotting,
JASC: Journal of Applied Science and Computations
Volume VI, Issue V, May/2019
ISSN NO: 1076-5131
Page No:2718
Latent Semantic Analysis, Rule Based modelling Transformation centered Learning, World knowledge Modelling, Key Phrase Spotting, and Naive Bayesian Networks. As the data is large, it becomes necessary to classify it for efficiency. A survey papers by [3][4][5] tackle a comprehensive overview of update in this field. The authors have presented a classification of different approaches used for sentiment analysis viz: Lexicon based, Machine learning and Hybrid. They investigated and presented these methods in this survey. The papers discuss various algorithms applied on different domains. The study can be further extended by applying those methods on other data sets or applying different algorithms at various phases of sentiment analysis. In the survey[3], the authors give a closer look on various fields like Task/objective (like sentiment identification, Resource building etc), Domain-orientation, Algorithms used, Polarity, Data scope, Data set/source, language(if cross-domain). Creation of lexicons is said to be Building Resources and aim is to create corpora in which opinion expressions are annotated according to their polarity, and sometimes dictionaries [3]. Authors, in [4], extends the survey by multitude encompassing the huge amount of articles which they classified as per granularity at which the sentiment analysis is done, i.e. document level, sentence level, Document level, Word level, Aspect level, Sentence level, Concept level, Phrase level, Link based, Clause level, Sense level. This extensive survey helps the beginners in the area. They not only discuss the steps needed for sentiment analysis but have also provided the list of tools available to carry out various steps. They have compared the accuracy measured and mentioned in various papers implementing the algorithms for sentiment analysis.
III. STATE OF THE ART IN SENTIMENT ANALYSIS
The steps to sentiment analysis can be roughly shown as:
Figure 1: Sentiment analysis process
Treating raw data includes tokenizing, eliminating tags,
stopwords removal, discarding punctuations and other
symbols. Pre-processing also includes tasks like stemming and
lemmatization
2.1 Data Collection: Huge amount of data is available on www for study and
analysis purpose. Depending upon the aim of the study, a data set can be chosen from widely available list or if not available
have to build one. If a model is to be tested for accuracy and
isn’t domain specific unlike one for medical, any closely
matching (satisfying the number and type of attributes needed)
data set can be chosen.
2.2 Data pre-processing Pre-processing involves all such natural language processing steps which ensure sentiments are recognised correctly.
2.2.1 Tokenization: Tokenization is a process is used to break a sentence into
words, phrases, symbols or other meaningful tokens by
removing punctuation marks. It is an NLP step. e.g ‘isn’t’ becomes ‘isnt’. 2.2.1 Stopword removal: Stopwords are frequently occurring words which do not play
any role in analysis as they do not carry any information.
Keeping stopwords increases the dimensionality of the
problem and hence the classification process becomes difficult
and less effective. E.g. ‘the’, ‘is’, ‘a’ etc. This list could be the domain specific. Punctuation
2.2.2 Stemming/ Lemmatization: A word may have many forms. E.g. ‘connect’ ‘connected’, ’connection’, ’connectionless’ etc. They carry similar meaning hence can be kept in the root
form. This will reduce index size and search time.
2.2.3 Indexing/ Synonym and or antonym Grouping: This step could be optional as it could be employed by few
modern methods devised to improve accuracy.
2.3 Feature Selection in Sentiment classification 2.3.1 Terms presence and frequency: Individual words or word n-grams along with their frequency counts are called
features. The term presence is denoted by a one alongside it,
or a positive integer denoting the number of times the term is
resent in the document. Researchers have experimented and
devised various techniques for term weighting like from TF-
IDF[6] to Positive Impact Factor[7].
2.3.2 Parts of speech (POS):
Nouns, adjectives and adverbs are extracted. They add to the
information in the system which helps do right analysis.
2.3.3. Opinions: These are words commonly used to express opinions like
‘good’, ‘best’, ‘love’ or hate. Opinions are also expressed in phrases. Opinions may not be explicitly expressed. That is
called implicit presence on opinions. For example: ‘it took me lifetime to figure out how to …’.
Figure 2: Types of sentiment classification techniques
Sentiment Classification
Lexicon Based Machine Learning Hybrid
Supervised Unsupervised
Pre-process
dadata Collect Data
Classify sentiment Select Features
Data set
Sentiment Polarity
JASC: Journal of Applied Science and Computations
Volume VI, Issue V, May/2019
ISSN NO: 1076-5131
Page No:2719
Different approaches have been tried in order to suit the
domain, dataset or improve the performance in terms of
accuracy. This can be broadly classified into three main
categories: Lexicon based, Machine learning and Hybrid [3].
2.4 Sentiment Classification: Classification is the task to assign the sentiment to the entity,
either a document, sentence or aspect. There are various
methods as depicted in figure 2.
Techniques for classification 2.4.1 Lexicon based Under lexicon based approaches, one can use either dictionary
or corpus based approach. Dictionary based approach will use
an existing dictionary, which is a collection of opinion words
along with their positive (+ve) or negative (-ve) sentiment
strength. In turn, dictionaries were created with/without using
ontology. Corpus based approach relies on the probability of
occurrence of a sentiment word in conjunction with positive or
negative set of words by performing search on very huge
amount of texts like Google search, AltaVista search etc.
2.4.2 Machine learning Machine learning yields maximum accuracy while semantic
orientation provides better generality. Machine learning can be
further divided into supervised and unsupervised approaches.
Some of classifiers used in hybrid models are the supervised
learning methods - Decision Tree (DT), SVM, Neural
Network (NN), Naïve Bayes.
2.4.2.1 Supervised Supervised Learning techniques are the ones where tagged
training data is available for training the algorithms.
Supervised methods most frequently used for sentiment
classification are SVM, Nave Bayesian Classifiers and other
Decision Trees. a) Naive Bayes
A Naive Bayes classifier is based on simple Bayes rule of
probability. The Naive Bayes model involves a simplifying
conditional independence assumption [3]. The words are
classified after calculating their degree of interdependence
using Bayes probability theorem.
b) Decision Tree
As the name suggests it is a tree based approach where in
internal nodes are represented by features/aspects, conditions
on feature values are shown by edges and leaf nodes represent
categories which are the outcome of the decision made at the internal nodes. A top-down approach is followed where the
root condition for a feature is chosen, and traversal happens
downwards till the class (leaf) is reached.
c) Support Vector Machines
In comparisons, SVM is a preferred technique in many
applications above Naïve Bayes. It is uses kernel trick to
classify the data appropriately. SVM puts each data point in
the space such that there is a clear hyperplane separating the
data points giving highest accuracy results in text
classification problems. Test data points are mapped into this
same space and are classified based on which side of the
hyperplane they get positioned [3].
2.4.3 Hybrid The hybrid Approach combines both approaches and is very
common with sentiment lexicons playing a key role in the
majority of methods. The various approaches and the most
popular algorithms are as mentioned before. Machine learning
has been applied at various stages of analysis from preprocessing to sentiment classification combined with
lexicon approaches giving better and better accuracy of
classification and prediction.
IV. COMPARISON AND REVIEW ANALYSIS
2.5 Issues in sentiment analysis: Sentiment analysis faces many challenges as mentioned
below. General Challenges discussed are handling negation
based on the words position in the sentence and handling
polysemy, Mapping Slangs, extended words, Domain
Generalization, Opinion Object Identification, Maintaining
Opinion Time, Language Generalization, Feature Matrix
Construction, Hidden Sentiments Identification and Updating / Down-dating Lexicons. These mentioned challenges are
studied by researchers for finding solutions to classify or
predict the sentiments.
1. Domain specific meaning: A word may carry different
meanings based on where it is being used. In one case it may
have a positive sentiment whereas in other it may carry a
negative sentiment. Ex- The resolution of the system is high
and the response time is also high. In this sentence the first
HIGH is showing the positive sentiment for the camera but the
second high is showing the negative sentiment for the system.
2. Interrogative Sentence: A question or a query may not carry any sentiments.
For example: What are the good features of an Activa?
3. Sarcastic Sentences: Sarcasm is the way of putting out a
negative sentiment using words that carry positive sentiments.
Recognizing sarcasm is a very challenging task. E.g. ‘You
sing so well, sounds like someone clearing his throat!!’ 4. Implicit Sentiments: Sometimes in a sentence, sentiments
are not put out explicitly by using words like ‘good’, ‘bad’ or ‘beautiful’ but the sentence expresses implicit sentiments. Domain knowledge is must to classify such sentences. Ex-
‘The vehicle consumes lot of petrol.’ 5. Natural language Issues Change Place to Place: Some words may be abbreviated or used in short form by
youngsters. E.g. ‘legitimate’ becomes ‘legit’, ‘Amazon Prime’ is called ‘prime’. While analyzing such sentences, there may
be some critical decision making needed.
6. Conditional sentences: Conditional statements may not
clearly specify the sentiments. Ex- If the picture quality of this
mobile camera is good I will buy the phone.
7. Understanding gap: Authors and readers may have different
perspectives based on the nationality, religion, political
orientation etc. e.g. ‘X political party won the elections’. This sentence have both the positive and negative meaning and its value is varying from person to person. This sentence has the
positive sentiment for the people belonging to the party while
this same sentence has the negative sentiment for the other
party.
JASC: Journal of Applied Science and Computations
Volume VI, Issue V, May/2019
ISSN NO: 1076-5131
Page No:2720
8. Spam Reviews Spam sentiments are those sentiments which
are posted by the opposite or competitor organization for
increasing their product value or their organization value
among the users. Some politician may use the same spam
review to just for their publicity.
9. Sentiments can be unrelated to the core issue: Sometimes opinions may not be related to the product or issue. E.g. ‘I love this phone as it gifted to me by my father.’ 10. Domain dependence: The model or the algorithms
designed may become domain dependent and hence less
adaptable to changes.
11. Morphology of languages: Languages around the world
have different syntax and semantics. Some languages are
morphologically rich and hence require different data
structures and more complex way to process the sentences.
12. Compound or Multi-dimensional: Multiple (may be
opposite as well) sentiments are expressed in the same
sentence with varying intensity as well. E.g. ‘I love watching Friends, but sometimes am bored of the repetitive jokes’
14. Handling Polysemy: It may be difficult to associate the
opinion with a right feature. E.g. the word ‘cross’ has meanings a) ‘The bark was painful’ has multiple meanings. [8].
1.2 Literature survey: Researchers have published to study and survey different tasks
carried out towards trying models where hybrid parameters are
explored for improving the efficiency of the system. To make
the 360 degree study of the field and to make available the in
depth knowledge in the field, the researchers are not leaving any (aspect) stone unturned. They have studied, surveyed and
applied approaches/ algorithms to carry out the task.
Aim, here, is to explore the use of hybrid methods which use
above mentioned methods (and more) in sentiment
classification or prediction.
Table 1: Applications with Hybrid Sentiment analysis (Note: Please refer on next page)
Hybrid approach can be defined by combining supervised
and/or unsupervised machine learning algorithm with
techniques from natural language processing activities. Sentiment analysis process encompasses various phases.
These phases may be designed taking into consideration the
applicability of methods that are traditional or evolutionary.
For making application available at real time, there’s a need to work on the efficiency and accuracy of the algorithms.
Authors, in [20], used deep convolution neural network to
exploit character to sentence level information for SA for short
texts. They worked on Stanford twitter sentiment(STS twitter
messages) and Stanford Sentiment Tree Bank (SSTb- movie
reviews) shows accuracy of 85.7% for SSTb and 86.4 for STS.
There are various applications targeted by researchers like movie review, news etc. A model is built and the data set is
tested to check the performance of the model on various
domains depending on the availability, ease of use, presence
and variety of enough attributes in the data set that would be
required to test the said method.
In paper [20], a simple CNN with little hyper parameter tuning
and static vectors has shown to achieve excellent results on
multiple benchmarks. In paper [21], the authors have used
hybrid technique using rule based and machine learning
model. Syntactic Rules are defined which can be utilized to
proficiently extricate aspects and opinions from a multi-label classifier. Machine learning is employed to learn the fitness of
the syntactic rules defined. Review Highlights making way for
various relation extraction techniques for noise elimination/
minimization. Various authors apart from this study have
employed on hybrid models for Morphologically Rich
Languages like Arabic, Malayalam even German and
multilingual sentiment analysis exploiting the tweets or
Amazon movie reviews of huge number using lexicon based,
supervised and unsupervised algorithms to achieve best of the
accuracies. Experiments are performed on tools like Weka and
R using SentiWordNet. Multi-class sentiment analysis
exploring the real time Micro-blogs, Weibo user profiling system based, camera review data, SemEval2014, electronic
product and restaurant datasets employing techniques like TF-
IDF, Naïve Bayes, LDA, SVM, GBDT, LSTM, RNN and
lexicon based. Different techniques are explored like Hybrid
Hierarchical classification methods for extracting adjectives
for implicit aspects and extracting new words, Most are aspect
based studies leading to a fine grained extraction local and
global attention networks.
Table 2: Comparison of Hybrid approaches (Note: Please refer on next page)
V. CONCLUSIONS
This paper presents a survey on the techniques used by
researchers in sentiment analysis through hybrid approach.
Many researchers are turning their attention from lexicon and
machine learning techniques towards using neural network in
at different phases of analysis to achieve better and better
results. The techniques are compared and reviewed to
understand the applications and have a roadmap of hybrid
sentiment analysis. Authors have their attention on
dimensionality reduction and optimization of the operations.
Computational cost and complexity of implementation made it
difficult to implement hybrid. With recent advancements in the hardware and decreasing cost, it has become relatively
easy to experiment the variations. For making application
available at real time, there’s a need to work on the efficiency and accuracy of the algorithms. For making application
available at real time, there’s a need to work on the efficiency and accuracy of the algorithms. Building or cross domain
classification, Capturing public opinion about social events,
political movements, company strategies, marketing
campaigns, and product preferences is garnering increasing
interest from the scientific community (for the exciting open
challenges), and from the business world.
JASC: Journal of Applied Science and Computations
Volume VI, Issue V, May/2019
ISSN NO: 1076-5131
Page No:2721
Table1: Application with hybrid sentiment analysis.
SN Ref. Application Methods used Dataset Remarks
1 [10] News Lexicon, KNN
and SVM
www..thegaurdia
n.com
Use of Scrapy 1.0.1 in python, MongoDB. Technology, Politics and Business sections of news
are considered.
2 [11] News and market Expert and hybrid
Weighting scheme
Thomson
Reuters News
stories
Use of crowd sourcing and experts in different
configurations and Matthews correlation coefficient
(MCC) using the ANOVA procedure.
3 [12] Crime detection Deep learning,
LDA
Twitter posts Use of the Graph database model, output as
Visual representation of hotspots
4 [13] Movie reviews Lexicon, NB and
Linear SVM
Bookmyshow,
IMDB, Rotten
tomatoes, netfilx
Python 3.4 and Natural Language Processing
Toolkit (NLTK)
5 [14] microbloging and
ecommerce
SVMperf weibo.com
suning.com LIBSVM, word2vec
6 [15] Customer
purchase intention
analysis
lexicon, NB, SVM - Analysis applied on Malay Language
7 [16] Location based
Real time- top
trending events
unigram, bigram,
NB and SVM
Twitter data “Where on Earth Id” (WOEID). Performs real time sentiment analysis. R tool with ‘sentiment’ lexicon
package uses ‘classify_polarity’ function to classify
tweets. A generic model.
8 [17] Stock market
prediction
lexican and
DENCLUE
NSE Two models are used. One uses sentiment analysis
whereas other uses clustering techniques for
prediction.
9 [18] Large Email data SVM, DT, NB,
Logistic, One R
regression
Enron BoWs (Bag of Words) Model, POS tagging process
is exclusively used for SWN labeling purpose. Apache Lucene libraries
10 [19] Trip
Recommendation
Tensor
factorization,
Sentiment Utility
logistic model
Trip advisor Integration of tensor factorization (TF) and
sentiment utility logistic model (SULM).
Table2: Hybrid sentiment analysis survey. SN Ref. Dataset Languag
e
G*
Techniques Adv Disadv Gaps/future work Evaluation
1 [22] Movie
review
Corpora
Pang and
Lee, HM*,
GI* and
OL*
English D Back Propagation
Artificial Neural
Network
Information Gain,
Vector Space Model
Performance
Scalability
Robust
Suitable for Large
datasets
-Error function
consideration.
-Needs large data
for training.
Could employ on
different domains
Accuracy
HM=95%
OL=89%
GI=86%
2 [23] Articles on
scientific
reviews.
Spanish ,
English
or
Portugue
se
D
S
POS tagging, a
scoring algorithm
and SVM
-Flexibility of the
SVM.
-Small
size of the data set.
-Higher
computational
cost.
-Deep learning
methods have not
been tested.
-Requires the usage
of the scoring
algorithm and
training the
SVM classifier
Binary class-
71%
ternary class-
58%
5-point class-
37%
3 [24] Online
mobile
phone
reviews
English D
S
Associating modified
K means algorithm
with Naïve Bayes
classification and
KNN
-Modified k-mean
algorithm avoids
getting into locally
optimal solution in
some degree,
-Reduces the adoption
of cluster-error
May need a huge
amount of data for
training.
Can be worked on
different domain data
and languages
Accuracy 91%
JASC: Journal of Applied Science and Computations
Volume VI, Issue V, May/2019
ISSN NO: 1076-5131
Page No:2722
criterion
-Faster than other ML
algorithms
4 [25] Movie
Review
Dataset by
pang n lee
English S Semantic Rules,
Fuzzy Sets and an
enriched Sentiment
Lexicon, improved
with the support
of SentiWordNet
proposed hybrid
system achieved
higher accuracy and
precision than Naive
Bayes (NB) and
Maximum
Entropy (ME).
Functioning
depends on the
rules designed.
Additional
properties and
characteristics can be
determined like
sentences that are
borderline between
being
subjective or
objective
Accuracy 76%
Precision 73%
Recall 83%
F1-score 77%
5 [26] HCTS ,
STS
and
Sanders
Corpus
(STC)
dataset
English A Association
Rule Mining (ARM)
augmented with a
heuristics
combination in part-
of-speech (POS)
patterns for detecting
explicit single and
muli-word aspect.
-PCA feature
selection
-SVM Classification
they do not classify the
context dependent
opinion words
the best performance
of the hybrid method
was
also produced by the
ABSA + Sentiwordnet
+ ELM +
Unigram, where the
value was 0.84126.
Association rule
mining is a time
consuming
process.
conduct experiment
with another social
media data such as
youtube and
facebook by using
the proposed hybrid
sentiment
classification
approach in order to
identify sentiment of
people
towards certain
issues.
PCA, LSA, RP
feature
selection
76.55, 71.62
and 74.24%
respectively
6 [27] Stock
market
dataset
English D Hybrid GARCH and
artificial neural
network framework
Paraneters studied:
stock market returns
and variance
Negative sentiment
does not seem to
influence volatility
Domain dependent Could employ on
different datasets
RMSE is
0.0005
7 [28] Customer
reviews
English A
Rule based hybrid
approach exploits
sequential patterns
and normalized
Google distance
(NGD). particle
swarm
Implicit aspects are
also extracted with
explicit
Synonym grouping
helps optimization
-Performance
depends on
synonym
groupings
-Computing
resources needed
are high
Real stream data
processing would
make the system
slow.
-
8 [29] NLPCC
2014
Product
reviews
English
and
Chinese
D RNNs with LSTM,
NB-SVM, word2vec
and bag-of-words.
- Performance is
improved.
-Can learn more
linguistic phenomena
when more
background
knowledge is available
- In-consistent
performance for
diverse languages.
-Could be applied on
different languages
and domains.
Accuracy rate
89%
9 [30] Movie
review
dataset
English S
Semantic rules, fuzzy
sets, unsupervised
machine learning
techniques
and a sentiment
lexicon improved
with the support of
Senti-
WordNet.
-Identifie different
strengths (intensity)in
the polarity degree
Accuracy depends
on Semantic rules
defined.
Mapping of methods
for MRLs
Accuracy=76%
Precision=73%
Recall=83%
F1=77%
10 [31] Amazon
movie
review
dataset
English D
-CNN as a feature
extractor from the
embeddings
-cuPSONN and PNN
for classification
-PNN preceded by t-
statistic based feature
selection (t-statistic-
PNN).
- CNN-PNN
statistically significant
wrt
CNN-cuPSONN and
t-statistic-PNN,
-But statistically the
same DMLP,
-Speeds up the
convergence to the
global optimum.
-Suffer from well
known
drawback such as
entrapping into
local optima, and
long convergence
time.
Evolutionary
methods can be used
such as Differential
Evolution, and
Particle Swarm
Optimization (PSO),
Ant Colony
Optimization.
AUC=95.44%
JASC: Journal of Applied Science and Computations
Volume VI, Issue V, May/2019
ISSN NO: 1076-5131
Page No:2723
11 [32] Yelp
dataset
Restaurants
dataset
English A CNN and fully-
connected
DNN architectures
-Assemble more
complex patterns using
smaller and simpler
patterns.
- uses relatively little
pre-processing
Words with
sentiment scores
specific to the
domain (here
Bookstores), and
have shown
significant
difference in
updated scores
after lexicon
generation when
compared to
corresponding
SentiWordNet
scores.
predictive
ability of our model
deteriorates as the
length of interval
starts
increasing, and
specifically beyond 6
months
Accuracy =
90%
12 [34] UCI ML ,
IMDB,
Amazon,
and Yelp
English D A novel genetic
algorithm (GA)-
based feature
reduction technique
-Customized
fitness function.
-Solves scalability
issue that arises as the
feature-set grows.
-reduce the feature-set
size by up to 42%
-Optimized feature
selection
-Provides run-time
analysis of our GA
based feature
reduction algorithm.
-Could cause
overfitting
-Computing time
needed is very
high
Accuracy =95%
13 [33] Pang and
Lee’s Movie
Review
Dataset
English W
2
D
Combination of
Sentiment Classifiers
and negation
Mappers
Hybrid approach of
sentiment classifiers
and negation mappers
addresses the issue of
polarity shift created
by explicit negation
modifiers
only negation
handled
associations among
the words other than
negation remains
unsolved
Accuracy using 10
FoldCV
Accuracy=
77.3
14 [35] -Movie
Review
Dataset
-Kaggle’s BoWs
meets BoP
dataset
-UCI’s Sentiment
Labelled
English W
2
P
2
S
2
D
Combination of
Sentiment Classifiers
and Language Parser
(Stanford Parser)
Classification based on
syntactic and semantic
structure of sentence in
review
Association of
words only within
sentence is
addressed
Association among
inter-sentence and
inter-document level
can be focussed
Accuracy=
93.9
96.3
99.1
15 [36] ChnSentiC
orp-htl-
4000 and
ChnSentiC
orp-nb-
4000
Tan3.
HowNet
Chinese D -Fitness
proportionate
selection
binary particle swarm
optimization
(FS-BPSO).
sentiment
classification
oriented
feature selection
domain. (SCO-FS-
BPSO)
-Over comes
unreasonable
update formula of
velocity and lack of
evaluation on
every single feature.
-Additional free
parameters will
make it more
difficult to tune
the algorithm
perfectly.
-More regional
languages for lodging
complaints and also
by identifying
whether the user is
giving
suggestion or
registering
complaint.
Accuracy=
84.50 %
(89.50 %) on
hotel review
dataset and
90.58 % (93.84
%) on
laptop review
dataset
G- Granularity which hold values D=Document, S=Sentence, A=Aspect, W2D= Word to Document, W2P2S2D= Word to Phrase to Sentence to Document.
HM- Hatzivassiloglou and McKeown, GI- General Inquirer Lexicon, OL- Opinion Lexicon
REFERENCES
[1] Alena Neviarouskaya, Helmut Prendinger, and Mitsuru Ishizuka. 2010.
Recognition of affect, judgment, and appreciation in text. In Proceedings of
the 23rd International Conference on Computational Linguistics (COLING
'10). Association for Computational Linguistics, Stroudsburg, PA, USA,
806-814.
[2] Vasileios Hatzivassiloglou and Kathleen R. McKeown. 1997. Predicting
the semantic orientation of adjectives. In Proceedings of the 35th Annual
Meeting of the Association for Computational Linguistics and Eighth
Conference of the European Chapter of the Association for Computational
Linguistics (ACL '98/EACL '98). Association for Computational
Linguistics, Stroudsburg, PA, USA,174-181. DOI:
https://doi.org/10.3115/976909.979640
[3] Walaa Medhat, Ahmed Hassan and Hoda Korashy, “Sentiment analysis algorithms and applications: A survey”, Elsevier BV, Ain Shams Engineering Journal, ISSN: 2090-4479, Vol: 5, Issue: 4, Page: 1093-1113,
(2014) 10.1016/j.asej.2014.04.011
[4] K. Ravi and V. Ravi, A survey on opinion mining and sentiment analysis:
tasks, approaches and applications, Knowledge-Based Systems (2015),
doi:http://dx.doi.org/10.1016/ j.knosys.2015.06.015
[5] Ribeiro, Filipe N. and Araujo, Matheus and Goncalves, Pollyanna and
André Gonçalves and Fabrício Benevenuto, “SentiBench - a benchmark
JASC: Journal of Applied Science and Computations
Volume VI, Issue V, May/2019
ISSN NO: 1076-5131
Page No:2724
comparison of state-of-the-practice sentiment analysis methods”, EPJ Data Science, July 2016, 5:23, DOI 10.1140/epjds/s13688-016-0085-1.
[6] Gerard Salton, Christopher Buckley, Term-weighting approaches in
automatic text retrieval, Information Processing & Management, Volume
24, Issue 5, 1988, Pages 513-523, ISSN 0306-
4573,https://doi.org/10.1016/0306-4573(88)90021-0.
[7] Emmanuel M., Khatri Saurabh M.; Babu D.R.Ramesh, "A Novel Scheme
for Term Weighting in Text Categorization: Positive Impact Factor,"
(SMC), 2013 IEEE International Conference on Systems, Man, and
Cybernetics, pp.2292,2297, 13-16Oct,2013,doi: 10.1109/SMC.2013.392.
[8] Ms Kranti Ghag and Dr. Ketan Shah, “Comparative Analysis of the Techniques for Sentiment Analysis”, ICATE 2013 Paper Identification Number-124
[9] Kim, Yoon. (2014). Convolutional Neural Networks for Sentence
Classification. Proceedings of the 2014 Conference on Empirical Methods
in Natural Language Processing. 10.3115/v1/D14-1181.
[10] Mukwazvure, A., & Supreethi, K. (2015, 09). A hybrid approach to
sentiment analysis of news comments. 2015 4th International Conference
on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends
and Future Directions). doi:10.1109/icrito.2015.7359282.
[11] Creamer, G. G., Ren, Y., Sacamoto, Y., & Nickerson, J. V. (2013, 09).
News and Sentiment Analysis of the European Market with a Hybrid
Expert Weighting Algorithm. 2013 International Conference on Social
Computing. doi:10.1109/socialcom.2013.61.
[12] Azeez, J., & Aravindhar, D. J. (2015, 08). Hybrid approach to crime
prediction using deep learning. 2015 International Conference on Advances
in Computing, Communications and Informatics (ICACCI).
doi:10.1109/icacci.2015.7275858
[13] Bandana, R. (2018, 05). Sentiment Analysis of Movie Reviews Using
Heterogeneous Features. 2018 2nd International Conference on Electronics,
Materials Engineering & Nano-Technology (IEMENTech).
doi:10.1109/iementech.2018.8465346
[14] Gao, K., Su, S., & Wang, J. (2015, 12). A sentiment analysis hybrid
approach for microblogging and E-commerce corpus. 2015 7th
International Conference on Modelling, Identification and Control
(ICMIC). doi:10.1109/icmic.2015.7409447
[15] Eshak, M. I., Ahmad, R., & Sarlan, A. (2017, 11). A preliminary study on
hybrid sentiment model for customer purchase intention analysis in social
commerce. 2017 IEEE Conference on Big Data and Analytics (ICBDA).
doi:10.1109/icbdaa.2017.8284108
[16] Haripriya, A., Kumari, S., & Babu, C. N. (2018, 09). Location Based Real-
time Sentiment Analysis of Top Trending Event Using Hybrid Approach.
2018 International Conference on Advances in Computing,
Communications and Informatics (ICACCI).
doi:10.1109/icacci.2018.8554457
[17] Rajput, V., & Bobde, S. (2016, 04). Stock market prediction using hybrid
approach. 2016 International Conference on Computing, Communication
and Automation (ICCCA). doi:10.1109/ccaa.2016.7813694
[18] Liu, S., & Lee, I. (2015, 11). A Hybrid Sentiment Analysis Framework for
Large Email Data. 2015 10th International Conference on Intelligent
Systems and Knowledge Engineering (ISKE). doi:10.1109/iske.2015.91
[19] Han, C., & Lin, B. (2018, 07). A Hybrid Model of Tensor Factorization
and Sentiment Utility Logistic Model for Trip Recommendation. 2018 1st
IEEE International Conference on Knowledge Innovation and Invention
(ICKII). doi:10.1109/ickii.2018.8569054
[20] Dos Santos, Cicero & Gatti de Bayser, Maira. (2014). Deep Convolutional
Neural Networks for Sentiment Analysis of Short Texts.
[21] Amit Kushwaha and Shubham Chaudhary. 2017. Review highlights:
opinion mining on reviews: a hybrid model for rule selection in aspect
extraction. In Proceedings of the 1st International Conference on Internet
of Things and Machine Learning (IML '17). ACM, New York, NY, USA,
Article 27, 6 pages. DOI: https://doi.org/10.1145/3109761.3158385
[22] Anuj Sharma and Shubhamoy Dey. 2012. An artificial neural network
based approach for sentiment analysis of opinionated text. In Proceedings
of the 2012 ACM Research in Applied Computation Symposium (RACS
'12). ACM, New York, NY, USA,37-42. DOI:
http://dx.doi.org/10.1145/2401603.2401611
[23] Brian Keith, Exequiel Fuentes, Claudio Meneses, A Hybrid Approach for
Sentiment Analysis Applied to Paper Reviews Proceedings of ACM
SIGKDD Conference,August 2017
[24] Ruchika Aggarwal, Latika Gupta, A Hybrid Approach for Sentiment
Analysis using Classification Algorithm International Journal of Computer
Science and Mobile Computing, June 2017
[25]Appel, O., Chiclana, F., Carter, J., & Fujita, H. (2016). A Hybrid Approach
to Sentiment Analysis with Benchmarking Results. IEA/AIE.
[26] Zainuddin, N., Selamat, A., & Ibrahim, R. (2017, 12). Hybrid sentiment
classification on twitter aspect-based sentiment analysis. Applied
Intelligence. doi:10.1007/s10489-017-1098-6
[27] Olaniyan, Rapheal, et al. “Sentiment and Stock Market Volatility
Predictive Modelling — A Hybrid Approach.” 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2015,
doi:10.1109/dsaa.2015.7344855.
[28] Rana, Toqir Ahmad, and Yu-N Cheah. “Hybrid Rule-Based Approach for
Aspect Extraction and Categorization from Customer Reviews.” 2015 9th International Conference on IT in Asia (CITA), 2015,
doi:10.1109/cita.2015.7349820.
[29] Liu, Guolong, et al. “A Hybrid Method for Bilingual Text Sentiment Classification Based on Deep Learning.” 2016 17th IEEE/ACIS
International Conference on Software Engineering, Artificial Intelligence,
Networking and Parallel/Distributed Computing (SNPD), 2016,
doi:10.1109/snpd.2016.7515884.
[30] Appel, Orestes, et al. “A Hybrid Approach to Sentiment Analysis.” 2016 IEEE Congress on Evolutionary Computation (CEC), 2016,
doi:10.1109/cec.2016.7744425.
[31] Dhariyal, B., Ravi, V., & Ravi, K. (2018). Sentiment analysis via Doc2Vec
and Convolutional Neural Network hybrids. 2018 IEEE Symposium Series
on Computational Intelligence (SSCI), 666-671.
[32] Thazhackal, Sharun S, and V. Susheela Devi. “A Hybrid Deep Learning Model to Predict Business Closure from Reviews and User Attributes
Using Sentiment Aligned Topic Model.” 2018 IEEE Symposium Series on Computational Intelligence (SSCI), 2018, doi:10.1109/ssci.2018.8628823.
[33] K. V. Ghag and K. Shah, “Negation Handling for Sentiment Classification,” 2016 International Conference on Computing Communication Control and automation (ICCUBEA), 2016.
[34] Iqbal, Farkhund, et al. “A Hybrid Framework for Sentiment Analysis
Using Genetic Algorithm Based Feature Reduction.” IEEE Access, vol. 7, 2019, pp. 14637–14652., doi:10.1109/access.2019.2892852.
[35] K. V. Ghag and K. Shah, “Conceptual Sentiment Analysis Model,” International Journal of Electrical and Computer Engineering (IJECE), vol.
8, no. 4, p. 2358, 2018.
[36] L. Shang, Z. Zhou, and X. Liu, “Particle swarm optimization-based feature
selection in sentiment classification,” Soft Comput., vol. 20, no. 10, pp. 3821–3834, 2016.
JASC: Journal of Applied Science and Computations
Volume VI, Issue V, May/2019
ISSN NO: 1076-5131
Page No:2725