8
Hybrid Sentiment Analysis- A Survey of Different Approaches and Techniques Jayashree Jagdale PhD Scholar, Computer Engineering Pacific University (PAHER), India [email protected] Dr. Emmanuel M. Professor, Information Technology Pune Institute of Computer Technology [email protected] AbstractSentiment Analysis has been attracting the researchers since a while. Studies span the various methods applied to extract opinions (called sentiments or even emotions), data resource building or cross domain classification. Capturing public opinion about social events, political movements, company strategies, marketing campaigns, and product preferences is garnering increasing interest from the scientific community (for the exciting open challenges), and from the business world. This paper presents a survey of different ways in which researchers have used employed hybrid models for sentiment analysis using lexicon and machine learning techniques. The techniques are compared and reviewed to understand the applications and have a roadmap of hybrid sentiment analysis. KeywordsSentiment Analysis, Lexicon Machine Learning, Neural Network, Human Behaviour. I. INTRODUCTION Sentiment Analysis - rapidly growing area that deals with automatically extracting peoples’ opinions, emotions or attitudes from unstructured data, and hence understanding the sentiments. It is essentially a boon from Information Retrieval. It is also rightly called opinion mining or subjective analysis. These opinions are a much needed fodder for people centric market, forming authentic feedback for businesses about their products and services. It provides them with many opportunities and insight to deal with market competition. Researchers have been using data from social networking sites, blog spots, chats, news, review websites and huge list of resources for sentiment analysis of. During conversations, knowingly / unknowingly, people express their opinions through comments, discussion forums. Many applications could benefit including entertainment and product review mining, product reputation analysis, spam filtering and tracking sentiments toward events. Intelligent systems have been built to extract facts associated with terrorist incidents, disease outbreaks, plane crashes, vehicle launches, management succession, joint ventures, corporate acquisitions, and job and seminar announcements and to predict stock market scenarios, environmental conditions, even business closures and many more. Sentiment analysis can be done at a word level, sentence level or on a document as a whole. In document level, the sentiments in the entire document are aggregated. The assumption is that the entire document has discussion on same feature. In sentence-level each sentence is analyzed to extract sentiments in whole sentence. Subjective sentences are classified as positive or negative; in aspect-level, the sentiment for the specific aspects of entities is studied [1]. II. RELATED WORK IN SEMTIMENT ANALYSIS In the first case the study of presence of words expressing positive or negative sentiments are checked for making use of data repositories like SentiWordNet where dictionaries are marked with scores pos, neg, obj (for positive, negative and objective resp.), SentiFul etc. In an information retrieval and area of natural language processing, set of words model, a practice of Vector Space Modelling, is a method where given unstructured text data which may be a sentence, paragraph or a document is converted to a structured form that contains vector of words and their relationship with the documents, disregarding grammar and even word order. In a later stage, the set of n-grams, and set of character n-grams feature selection methods were conceived and it was observed that they lead to the best results. Authors [1], defined WordNet utilizing words from distinctive dictionary as sources for sentiment carrying base constituents and relating the patterns of compound establishments. If adjectives set is applied with predetermined alignment tags such as positive or negative and if pairs of adjectives adjoined with conjunctions such as “or”, “and,” “either-or,” “but,” “neither -nor,” it is promising to forecast the orientation of two adjoined adjectives like “A beautiful and fresh fruit”, “A good script but poor dialogues”[2]. A method based on semi supervised minimum cut algorithms and distributional similarity is used to allocate the subjectivity tags for word senses. An idea would be to augment this polarity information to adjectives in the WordNet. Two anchor words (extremes of the polarity spectrum) were chosen. PMI of adjectives with respect to these adjectives is calculated as Polarity Score (W)= PMI(W, excellent) PMI (W, poor). Intensity of the sentiment is also considered by researchers. So when I say “the movie was nice” and you say “movie was awesome”, your sentiments carry a higher positive value. A number of approaches exist to study underlying affective state or grammatical options: Semantic Trees, Key_Word Spotting, JASC: Journal of Applied Science and Computations Volume VI, Issue V, May/2019 ISSN NO: 1076-5131 Page No:2718

JASC: Journal of Applied Science and Computations ISSN NO ... · analyzed to extract sentiments in whole sentence. ... unstructured text data which may be a sentence, paragraph or

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: JASC: Journal of Applied Science and Computations ISSN NO ... · analyzed to extract sentiments in whole sentence. ... unstructured text data which may be a sentence, paragraph or

Hybrid Sentiment Analysis- A Survey of Different

Approaches and Techniques

Jayashree Jagdale

PhD Scholar, Computer Engineering

Pacific University (PAHER), India

[email protected]

Dr. Emmanuel M.

Professor, Information Technology

Pune Institute of Computer Technology

[email protected]

Abstract—Sentiment Analysis has been attracting the

researchers since a while. Studies span the various methods

applied to extract opinions (called sentiments or even emotions),

data resource building or cross domain classification. Capturing

public opinion about social events, political movements, company

strategies, marketing campaigns, and product preferences is

garnering increasing interest from the scientific community (for

the exciting open challenges), and from the business world. This

paper presents a survey of different ways in which researchers

have used employed hybrid models for sentiment analysis using

lexicon and machine learning techniques. The techniques are

compared and reviewed to understand the applications and have

a roadmap of hybrid sentiment analysis.

Keywords—Sentiment Analysis, Lexicon Machine Learning,

Neural Network, Human Behaviour.

I. INTRODUCTION

Sentiment Analysis - rapidly growing area that deals with

automatically extracting peoples’ opinions, emotions or attitudes from unstructured data, and hence understanding the sentiments. It is essentially a boon from Information Retrieval.

It is also rightly called opinion mining or subjective analysis.

These opinions are a much needed fodder for people centric

market, forming authentic feedback for businesses about their

products and services. It provides them with many

opportunities and insight to deal with market competition.

Researchers have been using data from social networking

sites, blog spots, chats, news, review websites and huge list of

resources for sentiment analysis of. During conversations,

knowingly / unknowingly, people express their opinions

through comments, discussion forums. Many applications could benefit including entertainment and product review

mining, product reputation analysis, spam filtering and

tracking sentiments toward events. Intelligent systems have

been built to extract facts associated with terrorist incidents,

disease outbreaks, plane crashes, vehicle launches,

management succession, joint ventures, corporate acquisitions,

and job and seminar announcements and to predict stock

market scenarios, environmental conditions, even business

closures and many more. Sentiment analysis can be done at a

word level, sentence level or on a document as a whole.

In document level, the sentiments in the entire document are

aggregated. The assumption is that the entire document has

discussion on same feature. In sentence-level each sentence is

analyzed to extract sentiments in whole sentence. Subjective

sentences are classified as positive or negative; in aspect-level,

the sentiment for the specific aspects of entities is studied [1].

II. RELATED WORK IN SEMTIMENT ANALYSIS

In the first case the study of presence of words expressing

positive or negative sentiments are checked for making use of

data repositories like SentiWordNet where dictionaries are

marked with scores pos, neg, obj (for positive, negative and

objective resp.), SentiFul etc. In an information retrieval and

area of natural language processing, set of words model, a

practice of Vector Space Modelling, is a method where given

unstructured text data which may be a sentence, paragraph or a

document is converted to a structured form that contains vector of words and their relationship with the documents,

disregarding grammar and even word order. In a later stage,

the set of n-grams, and set of character n-grams feature

selection methods were conceived and it was observed that

they lead to the best results. Authors [1], defined WordNet

utilizing words from distinctive dictionary as sources for

sentiment carrying base constituents and relating the patterns

of compound establishments. If adjectives set is applied with

predetermined alignment tags such as positive or negative and

if pairs of adjectives adjoined with conjunctions such as “or”, “and,” “either-or,” “but,” “neither-nor,” it is promising to forecast the orientation of two adjoined adjectives like “A beautiful and fresh fruit”, “A good script but poor dialogues”[2].

A method based on semi supervised minimum cut algorithms and distributional similarity is used to allocate the subjectivity tags for word senses. An idea would be to augment this polarity information to adjectives in the WordNet. Two anchor words (extremes of the polarity spectrum) were chosen. PMI of adjectives with respect to these adjectives is calculated as Polarity Score (W)= PMI(W, excellent) – PMI (W, poor). Intensity of the sentiment is also considered by researchers. So when I say “the movie was nice” and you say “movie was awesome”, your sentiments carry a higher positive value. A number of approaches exist to study underlying affective state or grammatical options: Semantic Trees, Key_Word Spotting,

JASC: Journal of Applied Science and Computations

Volume VI, Issue V, May/2019

ISSN NO: 1076-5131

Page No:2718

Page 2: JASC: Journal of Applied Science and Computations ISSN NO ... · analyzed to extract sentiments in whole sentence. ... unstructured text data which may be a sentence, paragraph or

Latent Semantic Analysis, Rule Based modelling Transformation centered Learning, World knowledge Modelling, Key Phrase Spotting, and Naive Bayesian Networks. As the data is large, it becomes necessary to classify it for efficiency. A survey papers by [3][4][5] tackle a comprehensive overview of update in this field. The authors have presented a classification of different approaches used for sentiment analysis viz: Lexicon based, Machine learning and Hybrid. They investigated and presented these methods in this survey. The papers discuss various algorithms applied on different domains. The study can be further extended by applying those methods on other data sets or applying different algorithms at various phases of sentiment analysis. In the survey[3], the authors give a closer look on various fields like Task/objective (like sentiment identification, Resource building etc), Domain-orientation, Algorithms used, Polarity, Data scope, Data set/source, language(if cross-domain). Creation of lexicons is said to be Building Resources and aim is to create corpora in which opinion expressions are annotated according to their polarity, and sometimes dictionaries [3]. Authors, in [4], extends the survey by multitude encompassing the huge amount of articles which they classified as per granularity at which the sentiment analysis is done, i.e. document level, sentence level, Document level, Word level, Aspect level, Sentence level, Concept level, Phrase level, Link based, Clause level, Sense level. This extensive survey helps the beginners in the area. They not only discuss the steps needed for sentiment analysis but have also provided the list of tools available to carry out various steps. They have compared the accuracy measured and mentioned in various papers implementing the algorithms for sentiment analysis.

III. STATE OF THE ART IN SENTIMENT ANALYSIS

The steps to sentiment analysis can be roughly shown as:

Figure 1: Sentiment analysis process

Treating raw data includes tokenizing, eliminating tags,

stopwords removal, discarding punctuations and other

symbols. Pre-processing also includes tasks like stemming and

lemmatization

2.1 Data Collection: Huge amount of data is available on www for study and

analysis purpose. Depending upon the aim of the study, a data set can be chosen from widely available list or if not available

have to build one. If a model is to be tested for accuracy and

isn’t domain specific unlike one for medical, any closely

matching (satisfying the number and type of attributes needed)

data set can be chosen.

2.2 Data pre-processing Pre-processing involves all such natural language processing steps which ensure sentiments are recognised correctly.

2.2.1 Tokenization: Tokenization is a process is used to break a sentence into

words, phrases, symbols or other meaningful tokens by

removing punctuation marks. It is an NLP step. e.g ‘isn’t’ becomes ‘isnt’. 2.2.1 Stopword removal: Stopwords are frequently occurring words which do not play

any role in analysis as they do not carry any information.

Keeping stopwords increases the dimensionality of the

problem and hence the classification process becomes difficult

and less effective. E.g. ‘the’, ‘is’, ‘a’ etc. This list could be the domain specific. Punctuation

2.2.2 Stemming/ Lemmatization: A word may have many forms. E.g. ‘connect’ ‘connected’, ’connection’, ’connectionless’ etc. They carry similar meaning hence can be kept in the root

form. This will reduce index size and search time.

2.2.3 Indexing/ Synonym and or antonym Grouping: This step could be optional as it could be employed by few

modern methods devised to improve accuracy.

2.3 Feature Selection in Sentiment classification 2.3.1 Terms presence and frequency: Individual words or word n-grams along with their frequency counts are called

features. The term presence is denoted by a one alongside it,

or a positive integer denoting the number of times the term is

resent in the document. Researchers have experimented and

devised various techniques for term weighting like from TF-

IDF[6] to Positive Impact Factor[7].

2.3.2 Parts of speech (POS):

Nouns, adjectives and adverbs are extracted. They add to the

information in the system which helps do right analysis.

2.3.3. Opinions: These are words commonly used to express opinions like

‘good’, ‘best’, ‘love’ or hate. Opinions are also expressed in phrases. Opinions may not be explicitly expressed. That is

called implicit presence on opinions. For example: ‘it took me lifetime to figure out how to …’.

Figure 2: Types of sentiment classification techniques

Sentiment Classification

Lexicon Based Machine Learning Hybrid

Supervised Unsupervised

Pre-process

dadata Collect Data

Classify sentiment Select Features

Data set

Sentiment Polarity

JASC: Journal of Applied Science and Computations

Volume VI, Issue V, May/2019

ISSN NO: 1076-5131

Page No:2719

Page 3: JASC: Journal of Applied Science and Computations ISSN NO ... · analyzed to extract sentiments in whole sentence. ... unstructured text data which may be a sentence, paragraph or

Different approaches have been tried in order to suit the

domain, dataset or improve the performance in terms of

accuracy. This can be broadly classified into three main

categories: Lexicon based, Machine learning and Hybrid [3].

2.4 Sentiment Classification: Classification is the task to assign the sentiment to the entity,

either a document, sentence or aspect. There are various

methods as depicted in figure 2.

Techniques for classification 2.4.1 Lexicon based Under lexicon based approaches, one can use either dictionary

or corpus based approach. Dictionary based approach will use

an existing dictionary, which is a collection of opinion words

along with their positive (+ve) or negative (-ve) sentiment

strength. In turn, dictionaries were created with/without using

ontology. Corpus based approach relies on the probability of

occurrence of a sentiment word in conjunction with positive or

negative set of words by performing search on very huge

amount of texts like Google search, AltaVista search etc.

2.4.2 Machine learning Machine learning yields maximum accuracy while semantic

orientation provides better generality. Machine learning can be

further divided into supervised and unsupervised approaches.

Some of classifiers used in hybrid models are the supervised

learning methods - Decision Tree (DT), SVM, Neural

Network (NN), Naïve Bayes.

2.4.2.1 Supervised Supervised Learning techniques are the ones where tagged

training data is available for training the algorithms.

Supervised methods most frequently used for sentiment

classification are SVM, Nave Bayesian Classifiers and other

Decision Trees. a) Naive Bayes

A Naive Bayes classifier is based on simple Bayes rule of

probability. The Naive Bayes model involves a simplifying

conditional independence assumption [3]. The words are

classified after calculating their degree of interdependence

using Bayes probability theorem.

b) Decision Tree

As the name suggests it is a tree based approach where in

internal nodes are represented by features/aspects, conditions

on feature values are shown by edges and leaf nodes represent

categories which are the outcome of the decision made at the internal nodes. A top-down approach is followed where the

root condition for a feature is chosen, and traversal happens

downwards till the class (leaf) is reached.

c) Support Vector Machines

In comparisons, SVM is a preferred technique in many

applications above Naïve Bayes. It is uses kernel trick to

classify the data appropriately. SVM puts each data point in

the space such that there is a clear hyperplane separating the

data points giving highest accuracy results in text

classification problems. Test data points are mapped into this

same space and are classified based on which side of the

hyperplane they get positioned [3].

2.4.3 Hybrid The hybrid Approach combines both approaches and is very

common with sentiment lexicons playing a key role in the

majority of methods. The various approaches and the most

popular algorithms are as mentioned before. Machine learning

has been applied at various stages of analysis from preprocessing to sentiment classification combined with

lexicon approaches giving better and better accuracy of

classification and prediction.

IV. COMPARISON AND REVIEW ANALYSIS

2.5 Issues in sentiment analysis: Sentiment analysis faces many challenges as mentioned

below. General Challenges discussed are handling negation

based on the words position in the sentence and handling

polysemy, Mapping Slangs, extended words, Domain

Generalization, Opinion Object Identification, Maintaining

Opinion Time, Language Generalization, Feature Matrix

Construction, Hidden Sentiments Identification and Updating / Down-dating Lexicons. These mentioned challenges are

studied by researchers for finding solutions to classify or

predict the sentiments.

1. Domain specific meaning: A word may carry different

meanings based on where it is being used. In one case it may

have a positive sentiment whereas in other it may carry a

negative sentiment. Ex- The resolution of the system is high

and the response time is also high. In this sentence the first

HIGH is showing the positive sentiment for the camera but the

second high is showing the negative sentiment for the system.

2. Interrogative Sentence: A question or a query may not carry any sentiments.

For example: What are the good features of an Activa?

3. Sarcastic Sentences: Sarcasm is the way of putting out a

negative sentiment using words that carry positive sentiments.

Recognizing sarcasm is a very challenging task. E.g. ‘You

sing so well, sounds like someone clearing his throat!!’ 4. Implicit Sentiments: Sometimes in a sentence, sentiments

are not put out explicitly by using words like ‘good’, ‘bad’ or ‘beautiful’ but the sentence expresses implicit sentiments. Domain knowledge is must to classify such sentences. Ex-

‘The vehicle consumes lot of petrol.’ 5. Natural language Issues Change Place to Place: Some words may be abbreviated or used in short form by

youngsters. E.g. ‘legitimate’ becomes ‘legit’, ‘Amazon Prime’ is called ‘prime’. While analyzing such sentences, there may

be some critical decision making needed.

6. Conditional sentences: Conditional statements may not

clearly specify the sentiments. Ex- If the picture quality of this

mobile camera is good I will buy the phone.

7. Understanding gap: Authors and readers may have different

perspectives based on the nationality, religion, political

orientation etc. e.g. ‘X political party won the elections’. This sentence have both the positive and negative meaning and its value is varying from person to person. This sentence has the

positive sentiment for the people belonging to the party while

this same sentence has the negative sentiment for the other

party.

JASC: Journal of Applied Science and Computations

Volume VI, Issue V, May/2019

ISSN NO: 1076-5131

Page No:2720

Page 4: JASC: Journal of Applied Science and Computations ISSN NO ... · analyzed to extract sentiments in whole sentence. ... unstructured text data which may be a sentence, paragraph or

8. Spam Reviews Spam sentiments are those sentiments which

are posted by the opposite or competitor organization for

increasing their product value or their organization value

among the users. Some politician may use the same spam

review to just for their publicity.

9. Sentiments can be unrelated to the core issue: Sometimes opinions may not be related to the product or issue. E.g. ‘I love this phone as it gifted to me by my father.’ 10. Domain dependence: The model or the algorithms

designed may become domain dependent and hence less

adaptable to changes.

11. Morphology of languages: Languages around the world

have different syntax and semantics. Some languages are

morphologically rich and hence require different data

structures and more complex way to process the sentences.

12. Compound or Multi-dimensional: Multiple (may be

opposite as well) sentiments are expressed in the same

sentence with varying intensity as well. E.g. ‘I love watching Friends, but sometimes am bored of the repetitive jokes’

14. Handling Polysemy: It may be difficult to associate the

opinion with a right feature. E.g. the word ‘cross’ has meanings a) ‘The bark was painful’ has multiple meanings. [8].

1.2 Literature survey: Researchers have published to study and survey different tasks

carried out towards trying models where hybrid parameters are

explored for improving the efficiency of the system. To make

the 360 degree study of the field and to make available the in

depth knowledge in the field, the researchers are not leaving any (aspect) stone unturned. They have studied, surveyed and

applied approaches/ algorithms to carry out the task.

Aim, here, is to explore the use of hybrid methods which use

above mentioned methods (and more) in sentiment

classification or prediction.

Table 1: Applications with Hybrid Sentiment analysis (Note: Please refer on next page)

Hybrid approach can be defined by combining supervised

and/or unsupervised machine learning algorithm with

techniques from natural language processing activities. Sentiment analysis process encompasses various phases.

These phases may be designed taking into consideration the

applicability of methods that are traditional or evolutionary.

For making application available at real time, there’s a need to work on the efficiency and accuracy of the algorithms.

Authors, in [20], used deep convolution neural network to

exploit character to sentence level information for SA for short

texts. They worked on Stanford twitter sentiment(STS twitter

messages) and Stanford Sentiment Tree Bank (SSTb- movie

reviews) shows accuracy of 85.7% for SSTb and 86.4 for STS.

There are various applications targeted by researchers like movie review, news etc. A model is built and the data set is

tested to check the performance of the model on various

domains depending on the availability, ease of use, presence

and variety of enough attributes in the data set that would be

required to test the said method.

In paper [20], a simple CNN with little hyper parameter tuning

and static vectors has shown to achieve excellent results on

multiple benchmarks. In paper [21], the authors have used

hybrid technique using rule based and machine learning

model. Syntactic Rules are defined which can be utilized to

proficiently extricate aspects and opinions from a multi-label classifier. Machine learning is employed to learn the fitness of

the syntactic rules defined. Review Highlights making way for

various relation extraction techniques for noise elimination/

minimization. Various authors apart from this study have

employed on hybrid models for Morphologically Rich

Languages like Arabic, Malayalam even German and

multilingual sentiment analysis exploiting the tweets or

Amazon movie reviews of huge number using lexicon based,

supervised and unsupervised algorithms to achieve best of the

accuracies. Experiments are performed on tools like Weka and

R using SentiWordNet. Multi-class sentiment analysis

exploring the real time Micro-blogs, Weibo user profiling system based, camera review data, SemEval2014, electronic

product and restaurant datasets employing techniques like TF-

IDF, Naïve Bayes, LDA, SVM, GBDT, LSTM, RNN and

lexicon based. Different techniques are explored like Hybrid

Hierarchical classification methods for extracting adjectives

for implicit aspects and extracting new words, Most are aspect

based studies leading to a fine grained extraction local and

global attention networks.

Table 2: Comparison of Hybrid approaches (Note: Please refer on next page)

V. CONCLUSIONS

This paper presents a survey on the techniques used by

researchers in sentiment analysis through hybrid approach.

Many researchers are turning their attention from lexicon and

machine learning techniques towards using neural network in

at different phases of analysis to achieve better and better

results. The techniques are compared and reviewed to

understand the applications and have a roadmap of hybrid

sentiment analysis. Authors have their attention on

dimensionality reduction and optimization of the operations.

Computational cost and complexity of implementation made it

difficult to implement hybrid. With recent advancements in the hardware and decreasing cost, it has become relatively

easy to experiment the variations. For making application

available at real time, there’s a need to work on the efficiency and accuracy of the algorithms. For making application

available at real time, there’s a need to work on the efficiency and accuracy of the algorithms. Building or cross domain

classification, Capturing public opinion about social events,

political movements, company strategies, marketing

campaigns, and product preferences is garnering increasing

interest from the scientific community (for the exciting open

challenges), and from the business world.

JASC: Journal of Applied Science and Computations

Volume VI, Issue V, May/2019

ISSN NO: 1076-5131

Page No:2721

Page 5: JASC: Journal of Applied Science and Computations ISSN NO ... · analyzed to extract sentiments in whole sentence. ... unstructured text data which may be a sentence, paragraph or

Table1: Application with hybrid sentiment analysis.

SN Ref. Application Methods used Dataset Remarks

1 [10] News Lexicon, KNN

and SVM

www..thegaurdia

n.com

Use of Scrapy 1.0.1 in python, MongoDB. Technology, Politics and Business sections of news

are considered.

2 [11] News and market Expert and hybrid

Weighting scheme

Thomson

Reuters News

stories

Use of crowd sourcing and experts in different

configurations and Matthews correlation coefficient

(MCC) using the ANOVA procedure.

3 [12] Crime detection Deep learning,

LDA

Twitter posts Use of the Graph database model, output as

Visual representation of hotspots

4 [13] Movie reviews Lexicon, NB and

Linear SVM

Bookmyshow,

IMDB, Rotten

tomatoes, netfilx

Python 3.4 and Natural Language Processing

Toolkit (NLTK)

5 [14] microbloging and

ecommerce

SVMperf weibo.com

suning.com LIBSVM, word2vec

6 [15] Customer

purchase intention

analysis

lexicon, NB, SVM - Analysis applied on Malay Language

7 [16] Location based

Real time- top

trending events

unigram, bigram,

NB and SVM

Twitter data “Where on Earth Id” (WOEID). Performs real time sentiment analysis. R tool with ‘sentiment’ lexicon

package uses ‘classify_polarity’ function to classify

tweets. A generic model.

8 [17] Stock market

prediction

lexican and

DENCLUE

NSE Two models are used. One uses sentiment analysis

whereas other uses clustering techniques for

prediction.

9 [18] Large Email data SVM, DT, NB,

Logistic, One R

regression

Enron BoWs (Bag of Words) Model, POS tagging process

is exclusively used for SWN labeling purpose. Apache Lucene libraries

10 [19] Trip

Recommendation

Tensor

factorization,

Sentiment Utility

logistic model

Trip advisor Integration of tensor factorization (TF) and

sentiment utility logistic model (SULM).

Table2: Hybrid sentiment analysis survey. SN Ref. Dataset Languag

e

G*

Techniques Adv Disadv Gaps/future work Evaluation

1 [22] Movie

review

Corpora

Pang and

Lee, HM*,

GI* and

OL*

English D Back Propagation

Artificial Neural

Network

Information Gain,

Vector Space Model

Performance

Scalability

Robust

Suitable for Large

datasets

-Error function

consideration.

-Needs large data

for training.

Could employ on

different domains

Accuracy

HM=95%

OL=89%

GI=86%

2 [23] Articles on

scientific

reviews.

Spanish ,

English

or

Portugue

se

D

S

POS tagging, a

scoring algorithm

and SVM

-Flexibility of the

SVM.

-Small

size of the data set.

-Higher

computational

cost.

-Deep learning

methods have not

been tested.

-Requires the usage

of the scoring

algorithm and

training the

SVM classifier

Binary class-

71%

ternary class-

58%

5-point class-

37%

3 [24] Online

mobile

phone

reviews

English D

S

Associating modified

K means algorithm

with Naïve Bayes

classification and

KNN

-Modified k-mean

algorithm avoids

getting into locally

optimal solution in

some degree,

-Reduces the adoption

of cluster-error

May need a huge

amount of data for

training.

Can be worked on

different domain data

and languages

Accuracy 91%

JASC: Journal of Applied Science and Computations

Volume VI, Issue V, May/2019

ISSN NO: 1076-5131

Page No:2722

Page 6: JASC: Journal of Applied Science and Computations ISSN NO ... · analyzed to extract sentiments in whole sentence. ... unstructured text data which may be a sentence, paragraph or

criterion

-Faster than other ML

algorithms

4 [25] Movie

Review

Dataset by

pang n lee

English S Semantic Rules,

Fuzzy Sets and an

enriched Sentiment

Lexicon, improved

with the support

of SentiWordNet

proposed hybrid

system achieved

higher accuracy and

precision than Naive

Bayes (NB) and

Maximum

Entropy (ME).

Functioning

depends on the

rules designed.

Additional

properties and

characteristics can be

determined like

sentences that are

borderline between

being

subjective or

objective

Accuracy 76%

Precision 73%

Recall 83%

F1-score 77%

5 [26] HCTS ,

STS

and

Sanders

Twitter

Corpus

(STC)

dataset

English A Association

Rule Mining (ARM)

augmented with a

heuristics

combination in part-

of-speech (POS)

patterns for detecting

explicit single and

muli-word aspect.

-PCA feature

selection

-SVM Classification

they do not classify the

context dependent

opinion words

the best performance

of the hybrid method

was

also produced by the

ABSA + Sentiwordnet

+ ELM +

Unigram, where the

value was 0.84126.

Association rule

mining is a time

consuming

process.

conduct experiment

with another social

media data such as

youtube and

facebook by using

the proposed hybrid

sentiment

classification

approach in order to

identify sentiment of

people

towards certain

issues.

PCA, LSA, RP

feature

selection

76.55, 71.62

and 74.24%

respectively

6 [27] Stock

market

dataset

English D Hybrid GARCH and

artificial neural

network framework

Paraneters studied:

stock market returns

and variance

Negative sentiment

does not seem to

influence volatility

Domain dependent Could employ on

different datasets

RMSE is

0.0005

7 [28] Customer

reviews

English A

Rule based hybrid

approach exploits

sequential patterns

and normalized

Google distance

(NGD). particle

swarm

Implicit aspects are

also extracted with

explicit

Synonym grouping

helps optimization

-Performance

depends on

synonym

groupings

-Computing

resources needed

are high

Real stream data

processing would

make the system

slow.

-

8 [29] NLPCC

2014

Product

reviews

English

and

Chinese

D RNNs with LSTM,

NB-SVM, word2vec

and bag-of-words.

- Performance is

improved.

-Can learn more

linguistic phenomena

when more

background

knowledge is available

- In-consistent

performance for

diverse languages.

-Could be applied on

different languages

and domains.

Accuracy rate

89%

9 [30] Movie

review

dataset

English S

Semantic rules, fuzzy

sets, unsupervised

machine learning

techniques

and a sentiment

lexicon improved

with the support of

Senti-

WordNet.

-Identifie different

strengths (intensity)in

the polarity degree

Accuracy depends

on Semantic rules

defined.

Mapping of methods

for MRLs

Accuracy=76%

Precision=73%

Recall=83%

F1=77%

10 [31] Amazon

movie

review

dataset

English D

-CNN as a feature

extractor from the

embeddings

-cuPSONN and PNN

for classification

-PNN preceded by t-

statistic based feature

selection (t-statistic-

PNN).

- CNN-PNN

statistically significant

wrt

CNN-cuPSONN and

t-statistic-PNN,

-But statistically the

same DMLP,

-Speeds up the

convergence to the

global optimum.

-Suffer from well

known

drawback such as

entrapping into

local optima, and

long convergence

time.

Evolutionary

methods can be used

such as Differential

Evolution, and

Particle Swarm

Optimization (PSO),

Ant Colony

Optimization.

AUC=95.44%

JASC: Journal of Applied Science and Computations

Volume VI, Issue V, May/2019

ISSN NO: 1076-5131

Page No:2723

Page 7: JASC: Journal of Applied Science and Computations ISSN NO ... · analyzed to extract sentiments in whole sentence. ... unstructured text data which may be a sentence, paragraph or

11 [32] Yelp

dataset

Restaurants

dataset

English A CNN and fully-

connected

DNN architectures

-Assemble more

complex patterns using

smaller and simpler

patterns.

- uses relatively little

pre-processing

Words with

sentiment scores

specific to the

domain (here

Bookstores), and

have shown

significant

difference in

updated scores

after lexicon

generation when

compared to

corresponding

SentiWordNet

scores.

predictive

ability of our model

deteriorates as the

length of interval

starts

increasing, and

specifically beyond 6

months

Accuracy =

90%

12 [34] UCI ML ,

IMDB,

Amazon,

and Yelp

English D A novel genetic

algorithm (GA)-

based feature

reduction technique

-Customized

fitness function.

-Solves scalability

issue that arises as the

feature-set grows.

-reduce the feature-set

size by up to 42%

-Optimized feature

selection

-Provides run-time

analysis of our GA

based feature

reduction algorithm.

-Could cause

overfitting

-Computing time

needed is very

high

Accuracy =95%

13 [33] Pang and

Lee’s Movie

Review

Dataset

English W

2

D

Combination of

Sentiment Classifiers

and negation

Mappers

Hybrid approach of

sentiment classifiers

and negation mappers

addresses the issue of

polarity shift created

by explicit negation

modifiers

only negation

handled

associations among

the words other than

negation remains

unsolved

Accuracy using 10

FoldCV

Accuracy=

77.3

14 [35] -Movie

Review

Dataset

-Kaggle’s BoWs

meets BoP

dataset

-UCI’s Sentiment

Labelled

English W

2

P

2

S

2

D

Combination of

Sentiment Classifiers

and Language Parser

(Stanford Parser)

Classification based on

syntactic and semantic

structure of sentence in

review

Association of

words only within

sentence is

addressed

Association among

inter-sentence and

inter-document level

can be focussed

Accuracy=

93.9

96.3

99.1

15 [36] ChnSentiC

orp-htl-

4000 and

ChnSentiC

orp-nb-

4000

Tan3.

HowNet

Chinese D -Fitness

proportionate

selection

binary particle swarm

optimization

(FS-BPSO).

sentiment

classification

oriented

feature selection

domain. (SCO-FS-

BPSO)

-Over comes

unreasonable

update formula of

velocity and lack of

evaluation on

every single feature.

-Additional free

parameters will

make it more

difficult to tune

the algorithm

perfectly.

-More regional

languages for lodging

complaints and also

by identifying

whether the user is

giving

suggestion or

registering

complaint.

Accuracy=

84.50 %

(89.50 %) on

hotel review

dataset and

90.58 % (93.84

%) on

laptop review

dataset

G- Granularity which hold values D=Document, S=Sentence, A=Aspect, W2D= Word to Document, W2P2S2D= Word to Phrase to Sentence to Document.

HM- Hatzivassiloglou and McKeown, GI- General Inquirer Lexicon, OL- Opinion Lexicon

REFERENCES

[1] Alena Neviarouskaya, Helmut Prendinger, and Mitsuru Ishizuka. 2010.

Recognition of affect, judgment, and appreciation in text. In Proceedings of

the 23rd International Conference on Computational Linguistics (COLING

'10). Association for Computational Linguistics, Stroudsburg, PA, USA,

806-814.

[2] Vasileios Hatzivassiloglou and Kathleen R. McKeown. 1997. Predicting

the semantic orientation of adjectives. In Proceedings of the 35th Annual

Meeting of the Association for Computational Linguistics and Eighth

Conference of the European Chapter of the Association for Computational

Linguistics (ACL '98/EACL '98). Association for Computational

Linguistics, Stroudsburg, PA, USA,174-181. DOI:

https://doi.org/10.3115/976909.979640

[3] Walaa Medhat, Ahmed Hassan and Hoda Korashy, “Sentiment analysis algorithms and applications: A survey”, Elsevier BV, Ain Shams Engineering Journal, ISSN: 2090-4479, Vol: 5, Issue: 4, Page: 1093-1113,

(2014) 10.1016/j.asej.2014.04.011

[4] K. Ravi and V. Ravi, A survey on opinion mining and sentiment analysis:

tasks, approaches and applications, Knowledge-Based Systems (2015),

doi:http://dx.doi.org/10.1016/ j.knosys.2015.06.015

[5] Ribeiro, Filipe N. and Araujo, Matheus and Goncalves, Pollyanna and

André Gonçalves and Fabrício Benevenuto, “SentiBench - a benchmark

JASC: Journal of Applied Science and Computations

Volume VI, Issue V, May/2019

ISSN NO: 1076-5131

Page No:2724

Page 8: JASC: Journal of Applied Science and Computations ISSN NO ... · analyzed to extract sentiments in whole sentence. ... unstructured text data which may be a sentence, paragraph or

comparison of state-of-the-practice sentiment analysis methods”, EPJ Data Science, July 2016, 5:23, DOI 10.1140/epjds/s13688-016-0085-1.

[6] Gerard Salton, Christopher Buckley, Term-weighting approaches in

automatic text retrieval, Information Processing & Management, Volume

24, Issue 5, 1988, Pages 513-523, ISSN 0306-

4573,https://doi.org/10.1016/0306-4573(88)90021-0.

[7] Emmanuel M., Khatri Saurabh M.; Babu D.R.Ramesh, "A Novel Scheme

for Term Weighting in Text Categorization: Positive Impact Factor,"

(SMC), 2013 IEEE International Conference on Systems, Man, and

Cybernetics, pp.2292,2297, 13-16Oct,2013,doi: 10.1109/SMC.2013.392.

[8] Ms Kranti Ghag and Dr. Ketan Shah, “Comparative Analysis of the Techniques for Sentiment Analysis”, ICATE 2013 Paper Identification Number-124

[9] Kim, Yoon. (2014). Convolutional Neural Networks for Sentence

Classification. Proceedings of the 2014 Conference on Empirical Methods

in Natural Language Processing. 10.3115/v1/D14-1181.

[10] Mukwazvure, A., & Supreethi, K. (2015, 09). A hybrid approach to

sentiment analysis of news comments. 2015 4th International Conference

on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends

and Future Directions). doi:10.1109/icrito.2015.7359282.

[11] Creamer, G. G., Ren, Y., Sacamoto, Y., & Nickerson, J. V. (2013, 09).

News and Sentiment Analysis of the European Market with a Hybrid

Expert Weighting Algorithm. 2013 International Conference on Social

Computing. doi:10.1109/socialcom.2013.61.

[12] Azeez, J., & Aravindhar, D. J. (2015, 08). Hybrid approach to crime

prediction using deep learning. 2015 International Conference on Advances

in Computing, Communications and Informatics (ICACCI).

doi:10.1109/icacci.2015.7275858

[13] Bandana, R. (2018, 05). Sentiment Analysis of Movie Reviews Using

Heterogeneous Features. 2018 2nd International Conference on Electronics,

Materials Engineering & Nano-Technology (IEMENTech).

doi:10.1109/iementech.2018.8465346

[14] Gao, K., Su, S., & Wang, J. (2015, 12). A sentiment analysis hybrid

approach for microblogging and E-commerce corpus. 2015 7th

International Conference on Modelling, Identification and Control

(ICMIC). doi:10.1109/icmic.2015.7409447

[15] Eshak, M. I., Ahmad, R., & Sarlan, A. (2017, 11). A preliminary study on

hybrid sentiment model for customer purchase intention analysis in social

commerce. 2017 IEEE Conference on Big Data and Analytics (ICBDA).

doi:10.1109/icbdaa.2017.8284108

[16] Haripriya, A., Kumari, S., & Babu, C. N. (2018, 09). Location Based Real-

time Sentiment Analysis of Top Trending Event Using Hybrid Approach.

2018 International Conference on Advances in Computing,

Communications and Informatics (ICACCI).

doi:10.1109/icacci.2018.8554457

[17] Rajput, V., & Bobde, S. (2016, 04). Stock market prediction using hybrid

approach. 2016 International Conference on Computing, Communication

and Automation (ICCCA). doi:10.1109/ccaa.2016.7813694

[18] Liu, S., & Lee, I. (2015, 11). A Hybrid Sentiment Analysis Framework for

Large Email Data. 2015 10th International Conference on Intelligent

Systems and Knowledge Engineering (ISKE). doi:10.1109/iske.2015.91

[19] Han, C., & Lin, B. (2018, 07). A Hybrid Model of Tensor Factorization

and Sentiment Utility Logistic Model for Trip Recommendation. 2018 1st

IEEE International Conference on Knowledge Innovation and Invention

(ICKII). doi:10.1109/ickii.2018.8569054

[20] Dos Santos, Cicero & Gatti de Bayser, Maira. (2014). Deep Convolutional

Neural Networks for Sentiment Analysis of Short Texts.

[21] Amit Kushwaha and Shubham Chaudhary. 2017. Review highlights:

opinion mining on reviews: a hybrid model for rule selection in aspect

extraction. In Proceedings of the 1st International Conference on Internet

of Things and Machine Learning (IML '17). ACM, New York, NY, USA,

Article 27, 6 pages. DOI: https://doi.org/10.1145/3109761.3158385

[22] Anuj Sharma and Shubhamoy Dey. 2012. An artificial neural network

based approach for sentiment analysis of opinionated text. In Proceedings

of the 2012 ACM Research in Applied Computation Symposium (RACS

'12). ACM, New York, NY, USA,37-42. DOI:

http://dx.doi.org/10.1145/2401603.2401611

[23] Brian Keith, Exequiel Fuentes, Claudio Meneses, A Hybrid Approach for

Sentiment Analysis Applied to Paper Reviews Proceedings of ACM

SIGKDD Conference,August 2017

[24] Ruchika Aggarwal, Latika Gupta, A Hybrid Approach for Sentiment

Analysis using Classification Algorithm International Journal of Computer

Science and Mobile Computing, June 2017

[25]Appel, O., Chiclana, F., Carter, J., & Fujita, H. (2016). A Hybrid Approach

to Sentiment Analysis with Benchmarking Results. IEA/AIE.

[26] Zainuddin, N., Selamat, A., & Ibrahim, R. (2017, 12). Hybrid sentiment

classification on twitter aspect-based sentiment analysis. Applied

Intelligence. doi:10.1007/s10489-017-1098-6

[27] Olaniyan, Rapheal, et al. “Sentiment and Stock Market Volatility

Predictive Modelling — A Hybrid Approach.” 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2015,

doi:10.1109/dsaa.2015.7344855.

[28] Rana, Toqir Ahmad, and Yu-N Cheah. “Hybrid Rule-Based Approach for

Aspect Extraction and Categorization from Customer Reviews.” 2015 9th International Conference on IT in Asia (CITA), 2015,

doi:10.1109/cita.2015.7349820.

[29] Liu, Guolong, et al. “A Hybrid Method for Bilingual Text Sentiment Classification Based on Deep Learning.” 2016 17th IEEE/ACIS

International Conference on Software Engineering, Artificial Intelligence,

Networking and Parallel/Distributed Computing (SNPD), 2016,

doi:10.1109/snpd.2016.7515884.

[30] Appel, Orestes, et al. “A Hybrid Approach to Sentiment Analysis.” 2016 IEEE Congress on Evolutionary Computation (CEC), 2016,

doi:10.1109/cec.2016.7744425.

[31] Dhariyal, B., Ravi, V., & Ravi, K. (2018). Sentiment analysis via Doc2Vec

and Convolutional Neural Network hybrids. 2018 IEEE Symposium Series

on Computational Intelligence (SSCI), 666-671.

[32] Thazhackal, Sharun S, and V. Susheela Devi. “A Hybrid Deep Learning Model to Predict Business Closure from Reviews and User Attributes

Using Sentiment Aligned Topic Model.” 2018 IEEE Symposium Series on Computational Intelligence (SSCI), 2018, doi:10.1109/ssci.2018.8628823.

[33] K. V. Ghag and K. Shah, “Negation Handling for Sentiment Classification,” 2016 International Conference on Computing Communication Control and automation (ICCUBEA), 2016.

[34] Iqbal, Farkhund, et al. “A Hybrid Framework for Sentiment Analysis

Using Genetic Algorithm Based Feature Reduction.” IEEE Access, vol. 7, 2019, pp. 14637–14652., doi:10.1109/access.2019.2892852.

[35] K. V. Ghag and K. Shah, “Conceptual Sentiment Analysis Model,” International Journal of Electrical and Computer Engineering (IJECE), vol.

8, no. 4, p. 2358, 2018.

[36] L. Shang, Z. Zhou, and X. Liu, “Particle swarm optimization-based feature

selection in sentiment classification,” Soft Comput., vol. 20, no. 10, pp. 3821–3834, 2016.

JASC: Journal of Applied Science and Computations

Volume VI, Issue V, May/2019

ISSN NO: 1076-5131

Page No:2725