62
On Using Deep Learning for Sentiment Analysis Conor Brady School of Computer Science and Statistics University of Dublin This dissertation is submitted for the degree of BA(mod) in Computer Science Trinity College April 2015

hardback

Embed Size (px)

Citation preview

Page 1: hardback

On Using Deep Learning for Sentiment Analysis

Conor Brady

School of Computer Science and Statistics

University of Dublin

This dissertation is submitted for the degree of

BA(mod) in Computer Science

Trinity College April 2015

Page 2: hardback
Page 3: hardback

I dedicate this project to my parents, Jim and Méabh Brady, without whose unwavering supportduring my constant meandering over the last few years, I surely would have not had the

opportunity to submit this for consideration for my undergraduate degree. I owe you more thanI can ever repay.

Page 4: hardback
Page 5: hardback

Declaration

I hereby declare that this thesis is, except where otherwise stated, entirely my own work andthat it has not been submitted as an exercise for a degree at any other university

Conor BradyApril 2015

Page 6: hardback
Page 7: hardback

Acknowledgements

Foremost I would like to acknowledge my supervisor Rozenn Dayhot, who was helpful,involved and supportive all the way through the last few months. I would like to thank ConorMurphy and Micheal Barry for reading drafts of this report and providing invaluable feedback.A special thanks to Paddy Corr, whose supply of cigarettes kept me going through the hardtimes. Last but not least, Sam Green. Thanks for the speakers. They’re great speakers.

Page 8: hardback
Page 9: hardback

Abstract

This project relates to the analysis of applications of deep learning to sentiment analysis. Atechnology developed at Stanford University relating to this field is described. This technologyachieves state the art results in sentiment analysis of sentences of varying length. This is thenapplied in two novel implementations. Firstly, a Firefox extension that parses text on a webpage,then colours it according to sentiment. Secondly, a Javascript visualisation of the sentimentof users as they tweet during the Dublin Marathon 2014 and live over San Fransisco. Thesetechnologies are then evaluated along with the Stanford sentiment analysis engine.

Page 10: hardback
Page 11: hardback

Table of contents

List of figures xiii

1 Introduction 11.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.1 HTTP Sentiment Endpoint . . . . . . . . . . . . . . . . . . . . . . . 21.2.2 Sentiment Analysis Firefox Extension . . . . . . . . . . . . . . . . . 21.2.3 Geolocated Visualization of Tweets . . . . . . . . . . . . . . . . . . 2

2 Background 32.1 Distributed Semantic Vector Representations . . . . . . . . . . . . . . . . . . 32.2 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2.1 Feature Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2.2 Artificial Feed-Forward Neural Network . . . . . . . . . . . . . . . . 5

2.3 Learning Semantic Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4 Composition of Semantic Vectors . . . . . . . . . . . . . . . . . . . . . . . 92.5 Transformation to a Sentiment Space . . . . . . . . . . . . . . . . . . . . . . 102.6 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Implementation 133.1 The Stanford CoreNLP Java Library . . . . . . . . . . . . . . . . . . . . . . 13

3.1.1 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.1.2 Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2 Sentiment Firefox Extension . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2.1 Selectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2.2 Sentence Boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2.3 Server Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2.4 Colouring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Page 12: hardback

xii Table of contents

3.2.5 HTTPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2.6 Overview Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.3 Sentiment Rain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.3.1 The TCD GRAISearch Dataset . . . . . . . . . . . . . . . . . . . . . 233.3.2 Server Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.3.3 Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.3.4 Scenario-Tweets Resource . . . . . . . . . . . . . . . . . . . . . . . 243.3.5 Javascript Visualisation . . . . . . . . . . . . . . . . . . . . . . . . . 253.3.6 Sound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.3.7 Live . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4 Experimental Results 314.1 Stanford Sentiment Server . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.2 Firefox Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.3 Sentiment Rain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5 Future Work 395.1 Sentiment Analysis and Semantic Vectors . . . . . . . . . . . . . . . . . . . 395.2 Stanford Sentiment Server . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.3 Firefox Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.4 Sentiment Rain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

6 Conclusions 43

References 45

Page 13: hardback

List of figures

2.1 Example of feature engineering to classify binary digits into their respective⊕ output, introduction of the ∧ operator makes previously inseparable classeseasily separable. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Example of a feed-forward artificial neural network with two inputs, a hiddenlayer and one output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 Graph of a common activation function (Equation 2.2) used in the functionalunits of Artificial Neural Networks. . . . . . . . . . . . . . . . . . . . . . . 6

2.4 An example of a Recursive Neural Network, where the output is equal in lengthto each of its two child inputs . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.5 Example of the transformation of a semantic vector into the sentiment space . 11

3.1 Stanford CoreNLP Maven dependancy listing . . . . . . . . . . . . . . . . . 143.2 Server library imports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.3 Server initialisation code . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.4 Required API endpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.5 Sentiment Server Response . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.6 The sentiment server endpoint’s code . . . . . . . . . . . . . . . . . . . . . . 173.7 Firefox extension that highlights sentiment out of paragraphs of text. . . . . . 183.8 Firefox extension content selectors . . . . . . . . . . . . . . . . . . . . . . . 193.9 Sentence boundary detection regexes . . . . . . . . . . . . . . . . . . . . . . 203.10 Headers required to enable Cross-Origin Resource Sharing . . . . . . . . . . 203.11 Function for calculating the colour from a sentiment value . . . . . . . . . . 213.12 Plugin running on Twitter over HTTPS . . . . . . . . . . . . . . . . . . . . . 223.13 Sentiment Rain scenario showing sentiment of Tweets during the Dublin

Marathon in 2014 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.14 GRAISearch database endpoint . . . . . . . . . . . . . . . . . . . . . . . . . 233.15 Sample required /scenario_tweets resource . . . . . . . . . . . . . . . . 243.16 Sentiment Rain architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Page 14: hardback

xiv List of figures

3.17 Tweet selected by clicking a circle . . . . . . . . . . . . . . . . . . . . . . . 263.18 Sound synthesis schematic for audio creation parametrised by sentiment and

location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.19 An example of a ASDR envelope . . . . . . . . . . . . . . . . . . . . . . . . 283.20 Live view of tweets over San Francisco . . . . . . . . . . . . . . . . . . . . 29

4.1 Random sample of websites to test effectiveness of content selection fromextension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2 Short movie review [5], colour coded based on sentiment . . . . . . . . . . . 334.3 Short movie review [5], colour coded based on sentiment . . . . . . . . . . . 344.4 Short movie review [5], colour coded based on sentiment . . . . . . . . . . . 344.5 Short movie review [5], colour coded based on sentiment . . . . . . . . . . . 344.6 Excerpt from the Wikipedia page of Dublin, Ireland . . . . . . . . . . . . . . 354.7 Screenshot of Sentiment Rain over the Dublin Marathon 2014 . . . . . . . . 37

Page 15: hardback

Chapter 1

Introduction

1.1 Overview

Deep Learning is a broad field of research and this project has focused on its applicationto sentiment analysis, as set forth in [31]. Sentiment analysis is the determination of howpositively or negatively an author of a piece of text feels about the subject of the text. This hasapplications in fields such as predictions of stock market prices and corporate brand evaluation.[31] has led to state of the art results in analysis of the sentiment sentences of variable length,with 85% successful classification of entire sentences into classes of positive or negative.

Chapter 2 relates to a discussion of the technologies and ideas that leads to [31], withan emphasis on an abstract understanding of the mathematics behind it. It explains stage bystage the technologies that lead to the Recursive Neural Tensor Network, a technology thatsemantically composes a sentence up its syntax tree, then a transformation of the output revealsthe sentiment information inherent in the sentence.

Using the library resulting from the efforts of the Stanford department, made availableunder the GPLv3 license, this report chronicles the creation of novel uses of the sentimentanalysis engine, with regard to various sources of text on the Internet. The process in buildingthese technologies is described in Chapter 3.

Chapter 4 investigates the results from the implementations and evaluates their effectiveness.

Consideration for future development based on the results is considered in Chapter 5.

Finally, concluding remarks are documented in Chapter 6. These relate to evaluating theapplication of deep learning technologies to sentiment analysis and the effectiveness of the useof these technologies in the implementations described within.

Page 16: hardback

2 Introduction

1.2 Technologies

Three technologies have been developed as part of this final year project. While the first twoapplications (Section 1.2.1 and 1.2.2) have been designed and developed independently withuse of the Stanford CoreNLP library, the third has benefited from interacting with researcherson an ongoing European project coordinated by the supervisor Prof. Rozenn Dahyot [13].More specifically, Cyril Bourges, Marie Curie Research Fellow on this GRAISearch project,has provided access to the dataset ’Dublin Marathon 2014’ hosted on a server in the school ofcomputer science and Statistics in Trinity College.

1.2.1 HTTP Sentiment Endpoint

This is a server consisting of a wrapper around the library, as supplied by the Stanford NLPGroup, that allows for sentiment analysis to be carried out by any other service able to makeHTTP requests across the Internet. It will return a number between 0 for negative and 1 forpositive allowing further processing to be carried out. This is the foundation of the other twotechnologies, and will run on its own server hosted by Amazon Web Services.

1.2.2 Sentiment Analysis Firefox Extension

An extension that when installed in the Firefox web browser will automatically search forpassages of text in webpages. Once the passages are identified it colour codes sentencesdepending on their sentiment: red for negative, black for neutral and green for positive, with agradient between.

1.2.3 Geolocated Visualization of Tweets

Another novel application of sentiment analysis, this takes a dataset of tweets over a period, inthis case the Dublin City Marathon 2014, and displays them on a map relating to where theywere tweeted over the course of the scenario. Using colour and music to signify the sentimentof the tweet it creates an interactive audiovisual experience, where tweets can be intuitivelyviewed and navigated over the course of the scenario.

Page 17: hardback

Chapter 2

Background

Deep learning has led to state of the art innovation in the areas of Automatic Speech Recognition[35], Image Recognition [17] and Natural Language Processing [18]. This project specificallydeals with its applications in sentiment analysis through the creation of semantic vectorrepresentations. A brief introduction is provided in this chapter to the technologies that haveled to the state of the art results in sentiment analysis of sentences of varying length [31].

Section 2.1 provides an introduction to the idea of a semantic vector representation of aninput and contrasts it to an atomic representation, the example given of English words.

Section 2.2 provides an introduction to the area of deep learning, the factors that lead to itssuccess and how it is implemented in artificial neural networks.

Section 2.3 returns to semantic vectors and discusses how these can be learned by artificialneural networks.

Section 2.4 discusses how these semantic vectors, when learned to describe words in theEnglish language, can be composed up a sentence’s syntax tree resulting in a single vector thatdescribes the semantic meaning of the sentence.

Section 2.5 describes how vectors in the semantic space may be transformed into a sentimentspace to distill sentiment information from the sentiment vectors.

Finally, Section 2.6 investigates how the Stanford NLP department trained these technolo-gies with crowdsourcing labelled data.

2.1 Distributed Semantic Vector Representations

Words exist as a means of communication between people, but they are just labels people whospeak the same language have agreed upon to communicate meaning. They offer nothing inso far as denoting what they describe. Words are dense representations in that each symbolof the word offers nothing in isolation, they are atomic units. In the alternative, a distributed

Page 18: hardback

4 Background

representation, some of the dimensions may be lost and still information about the object ispresent.

For example "cat" and "lion" trigger commonalities in the mind as they trigger internalrepresentations, but are useless to computers for describing the objects they denote. What thesesystems need are its own distributed semantic vector representations. These semantic vectorsdescribe, with each dimension, some feature of the input that is useful for solving meaningfulproblems.

2.2 Deep Learning

Deep learning is the process of learning successive semantic vector representations of data ina hierarchical manner. Each vector feeds into the next, more abstract vector to get more andmore abstract representations. For example, given an input of an image on a network tasked tofind faces, the first layer could detect edges, the next layer eyes, ears and noses and then a thirdcould detect faces. Presented with raw image data at the input, it generates successively moreabstract vectors to eventually carry out a task at the output.

To return to the cat analogy, one feature of an abstract layer would be ’is it a cat?’, followedon an even more abstract layer with ’is it a lion?’ and these would take cue from the lowerlayers of ’has it four legs?’ or ’has is it got fur?’. This is useful for simplifying the system as ateach layer information is concentrated, abstracted and discarded.

The process of learning is by determining how much of an error is in the network at theoutput. Functional units that produce the output are modified and optimised to reduce the errorand eventually generalise to produce the correct output for unseen inputs.

2.2.1 Feature Engineering

Traditional machine learning depends on feature engineering. Feature engineering is thepractice of hand producing a set of functions that transform an input into a semantic or featurevector. The goal is to make the feature vector a good representation of the input. Whether thatbe by lowering the dimensionality and increasing the signal to noise ratio, or to perform nonlinear operations to transform a dimension of the output vector making it linearly separablefrom the other vectors of a different class. The engineer tries to define a semantic distributedrepresentation by hand.

A toy example of feature engineering can be seen in Figure 2.1. Here the class of inputs tothe exclusive disjunction operator that output a 1 (in red) are linearly inseperable from thosethat output a 0 (in blue), from the input alone with two dimensional line. Once the conjunction

Page 19: hardback

2.2 Deep Learning 5

Fig. 2.1 Example of feature engineering to classify binary digits into their respective ⊕ output,introduction of the ∧ operator makes previously inseparable classes easily separable.

of the inputs is introduced in the three dimensional model as the new dimension, the classescan easily be separated by a plane in the three dimensional space.

This feature engineering is time consuming, depends heavily on the expertise of the engineerand may miss important features of the data that would be useful in carrying out the task athand. Also the degree of complexity to feed the output of a layer of features into the inputof another layer grows with each successive layer of abstraction. This means truly powerfulhierarchical representations of the input are difficult to manually engineer.

Instead of defining these feature vectors by hand, they can be learned. These feature vectorsmay then reveal information about the data useful to carrying out the task that is less apparent,but still contributes to the robustness of the system. As the outputs of one layer of features areused as the inputs into the next layer of features, these features become more robust as eachwould be required to influence multiple features in the next layer. The forced generalisationhelps stop the features from overfitting. This can be used through multiple layers of features,creating complex hierarchical representations of the input. Further analysis can be found in [3].

2.2.2 Artificial Feed-Forward Neural Network

A common implementation of deep learning is an artificial feed-forward neural network, asseen in Figure 2.2. This comprises of layers of non-linear functional units, commonly referredto as "neurons", a layer of these produces a linear transformation of the input vector followedby an offset (Equation 2.1) and a non-linear activation function. The activation function isusually a sigmoid function (Figure 2.3 and Equation 2.2), due to its asymptotic nature and

Page 20: hardback

6 Background

Fig. 2.2 Example of a feed-forward artificial neural network with two inputs, a hidden layerand one output

Fig. 2.3 Graph of a common activation function (Equation 2.2) used in the functional units ofArtificial Neural Networks.

convenient derivative that is in terms of its input (Equation 2.3). More on activation functionsto follow.

z = xTW +b (2.1)

f (z) =1

1+ e−z (2.2)

f ′(z) = f (z)(1− f (z)) (2.3)

These functional units are parametrised by a weight matrix W and a bias vector b on eachlayer. The weights are parameters of a transformation on the input vector and the biases are anoffset that is applied after the transformation but before the non-linear activation function.

Page 21: hardback

2.2 Deep Learning 7

Due to the non-linear nature of each transformation, complex transformations of the inputdata can be achieved. Given multiple layers, transformations that would have been impossiblewith a single layer of hand tuned features can be described.

Networks are commonly trained via supervised learning, this is a method of training wherethe network has a dataset of inputs x, be they text, image, video or some other raw data, andcorresponding labelled outputs y which describe correct answers for the each input on the taskat hand. The dataset is then split into the training set and the testing set randomly. The networkis then trained on the training set and its ability is assessed on the testing set.

These networks are trained with the method of back propagation [27]. This is a form ofgradient descent that has been designed specifically for artificial neural networks. It worksby defining a cost function that describes the error of the network and the error is propagatedbackwards from the output through the layers, all the way to the input, altering the parametersalong the way. Generally for supervised learning, where an output y for a given input x needsto be learned, the squared-mean error given in Equation 2.4, is used as a cost function.

Cx =12||y− f (x)||2 (2.4)

Back-propagation is achieved by taking advantage of a non-linear activation function withconvenient derivatives that simplifies the system’s partial derivatives (the rate of change of eachparameter in the system with respect to the error) such as the sigmoid function (Equation 2.2).Then by getting the rate of change of the error with respect to each weight and bias in thesystem, each weight is altered by a small amount (known as a learning rate). The parametersare adjusted in the direction of the gradient of each rate of change in order to minimise the costfunction. If this is repeated for a diversified input, a good generalised representation that canproduce useful classifications or representations of an unseen input can be learned.

One disadvantage of neural networks is they don’t benefit from the head start most machinelearning techniques receive from feature engineering. When engineering features the engineercan use intuition to rapidly develop meaningful features. An artificial neural network may taketime to develop these features. It has to slowly move down a gradient in high dimensional space,but given enough training data and computational resources they are capable of surpassing eventhe most complex hand made features. Without this foresight they can also get stuck in localminima in the gradient space. This can be helped by using other optimisation strategies such assimulated annealing.

Page 22: hardback

8 Background

2.3 Learning Semantic Vectors

Semantic vectors can be learned by a neural network being exposed to problems it can onlysolve by building layered hierarchical vector representations of the input that are necessary insolving the problem. The input can be any raw data and the network can learn these vectorsand transformations via back-propagating the error on the output and adjusting the parametersto minimise the output of a cost function.

These vectors hold enough semantic meaning to attempt to solve any problem the networkhas seen in the past, so if the question “is it a cat?” is proposed to a network trained on words inthe English language, inputs such as "lion" and "cat" will have probably have some dimensionclose to one, whereas the input “chair” will have a number closer to zero in that dimension. Itis worth noting however that unless this information has been useful to the network in solving aproblem it has seen in the past, it would not learn this information and thereby a diverse inputis essential for complete, robust learning. For example the network could have learned that asufficient commonality to identify cats is that they have four legs, but then it could confuse achair for a cat given it has four legs. This is not a failure of the network, but more a failureof the training data. In practice vector dimensions are rarely this quantifiable, this example iscontrived and for illustrative purposes. Getting this kind of information from the generatedvectors requires learning a linear transformation of the vector space, as seen in Section 2.5.

The idea of learning semantic vectors to represent words in the English language is in-troduced in [4], and developed further in [7], with the unsupervised generation of vectorrepresentations of words. The methods set out in the latter paper generate a language matrixover the words found on Wikipedia. This matrix in initialised to gaussian noise and aftertraining represents relations between words as approximately linear relationships in this highdimensional space, with each column vector representing a word across the language. Eachword in the language indexes a vector from the matrix and provides it as input to the system.The network is trained to come up with meaningful representations of the input so it can carryout the task at hand, which in [7], is to give correct English sentences a higher score thanincorrect English sentences.

The use of this method purely for mapping relationships to a linear space has been improvedupon, with the current state of the art set out in [21]. The evaluation of this technology foundthat if the vector representing the word "Man" is subtracted from the vector representing theword "King", and this resultant vector is added to the vector for "Woman", we end up with avector that is close to "Queen".

Page 23: hardback

2.4 Composition of Semantic Vectors 9

Fig. 2.4 An example of a Recursive Neural Network, where the output is equal in length toeach of its two child inputs

2.4 Composition of Semantic Vectors

The idea of composing semantic vectors is proposed in [33]. The idea of composing wordvectors together to enable representations of varying length sentences is useful as it allows thesemantic vectors of any sentence length to be compared.

This paper re-introduces the Recursive Neural Network (RNN) set out in [23]. A RNN(Figure 2.4) is a network that the output vector is the same in length as each of the two inputvectors. This means the output can then be provided to the same network as an input at the nextstage. Each input/output mapping uses the same weight matrix. This is useful for applyingover a structure, such as an abstract syntax tree. Where the child nodes can be composed intoan output vector representation at the parent node. This can then act as a child for its parent,until a single vector representing the tree is composed from bottom-up. This works under theassumption that the composition function for the children into a parent is constant for all childparent relations.

A problem with the basic RNN is the limitation on how one word can transform the other.The vectors are concatenated before being multiplied by the weight matrix and they cannotmultiplicatively affect one another, they can only additively affect the output vector, as shownin Equation 2.5. p is the resultant parent vector with c1 and c2 as child vectors transformed byW , all of which the same length. Note the similarity to Equation 2.1, with f (z) coming fromEquation 2.2, the bias is omitted, as it sometimes is in artificial neural networks to simplify thesystem.

p = f (W [c1 : c2]) (2.5)

Page 24: hardback

10 Background

Words like ‘not’ are inherently unary operators and have no real stand alone meaning. Theycould be viewed more as matrices than vectors that should transform the semantic vector ofanother word. For example, ‘excited’ would have a specific meaning, ‘not excited’ would thenbe linearly transformed across the vector space by the word ‘not’. This can conceptually beachieved through the use of Matrix-Vector Recursive Neural Networks (MV-RNNs).

MV-RNNs [32] seek to solve this issue with assigning each word a matrix, as well as avector, that can transform the sibling vector. These matrices can be learned in the same way asthe vectors and can enable words like "not" transform other words into a different area of thespace before combining the two, allowing more flexible representations to be learned. However,the number of parameters gets squared by the introduction of the matrices and so does thedimensionality of the gradient space, so these matrices are difficult to learn. A lower parameterversion of this approach is required.

Recursive Neural Tensor Networks (RNTNs) [31] are a development on the MV-RNNs,they are RNNs that add a term that consists of a concatenation of two child vectors, transposed[c1 : c2]

T then multiplied by a square slice of a parameterised rank 3 tensor Vi, then multipliedby another concatenation of the child vectors [c1 : c2] (Equation 2.6) to the composition function.This results in a scalar value and this is repeated for each slice of the tensor until a vector as longas a child is produced. This is added to product of the weight matrix W and the concatenatedchildren as in the RNN. Most importantly, this gives the children multiplicative effects onone another without squaring the parameter set. The effect each position of the vector has onthe other is controlled by the value at a given position in the tensor. This results in certaindimensions of the semantic word vectors relating to affects on siblings and some relating topure semantic content.

pi = f ([c1 : c2]TVi[c1 : c2]+Wi[c1 : c2]) (2.6)

2.5 Transformation to a Sentiment Space

Carrying the proposition that the semantic space contains all the semantic information relevantto the network during its training, then it follows that with relevancy of sentiment duringtraining, the sentiment information must be contained in the semantic space. A projection canbe performed into a sentiment space to reveal the sentiment information.

A sentiment space with five interdependent dimensions is proposed, very negative, negative,neutral, positive and very positive. A vector can be transformed from the semantic space tothis space by a matrix learned in the same way as the vectors and is trained on labeled data.So given a vector representing “this movie is terrible” in the semantic vector space, a matrix

Page 25: hardback

2.6 Training 11

Fig. 2.5 Example of the transformation of a semantic vector into the sentiment space

would be trained to output close to a 1 in the very negative dimension and zeros in all the otherdimensions.

2.6 Training

In order to train all the parameters (the semantic vectors, the composition function and thesemantic to sentiment projection matrix), a large amount of labeled data is required. TheStanford NLP department collected this data by utilising Amazon’s mechanical turk, a servicefor crowd sourcing tasks that are difficult for a computer to carry out but take moments for ahuman to perform.

They use a common data set acquired from the website Rotten Tomatoes in the form ofmovie reviews. The dataset consists of 10,000 sentences which are parsed into their syntaxtrees by technologies sourced elsewhere in the Stanford NLP department, resulting in 200,000subtrees or phrases. Each phrase is presented without context to a person on Mechanical Turkwhere they give it a score from very negative to very positive, these scores are then translatedto sentiment vectors that can be used to train all the parameters of the network.

Page 26: hardback
Page 27: hardback

Chapter 3

Implementation

This chapter outlines the processes involved in creating three technologies. Firstly, a Java serverto host the Stanford CoreNLP library described in Section 3.1. Second, a Firefox extensionthat parses any website for content, then using the technology outlined in Section 3.1 performscoloured sentiment highlighting of the text, described in detail in Section 3.2. Finally, thelocation-based audio-visual technology titled "Sentiment Rain" is presented in Section 3.3.

3.1 The Stanford CoreNLP Java Library

The technologies described in the previous chapter are available as part of the Stanford CoreNLPlibrary [18], specifically the technologies described in [31] are available as the sentimentanalysis package.

3.1.1 Configuration

The Stanford sentiment library is available in Java and hosted in the Maven Central repository[20]. This allows a project to be created with the library listed as a dependency (Fig 3.1).This keeps the code easier to manage and allows the introduction of updates to the project bychanging the version number and rebuilding. This is particularly useful as the models are likelyto be upgraded regularly by the Stanford NLP department.

The goal is to create a network accessible HTTP endpoint that allows the building ofplatform-agnostic, novel implementations with Stanford CoreNLP’s sentiment technologies.The first task is to choose a server stack that facilitate such an endpoint across the Internet.

As the library is hosted in the Maven Central Repository, a Java server with Mavenbuilding process is the clear choice. A Spring MVC web application and RESTful webservice framework [34] suffices this requirement. It has a Maven build process and manages

Page 28: hardback

14 Implementation

<dependency><groupId>edu.stanford.nlp</groupId><artifactId>stanford-corenlp</artifactId><version>3.5.1</version>

</dependency><dependency>

<groupId>edu.stanford.nlp</groupId><artifactId>stanford-corenlp</artifactId><version>3.5.1</version><classifier>models</classifier>

</dependency>

Fig. 3.1 Stanford CoreNLP Maven dependancy listing

dependencies while also providing much of the boilerplate code involved in creating a webserver.

As shown in Fig 3.2, Spring Java Annotators are imported to inject the boilerplate codeinvolved in creating a web-server into the code-base. Also some Java imports, and the StanfordCoreNLP libraries with one dependency of the Efficient Java Matrix Library’s SimpleMatrix.

As shown in Fig 3.3, the use of the CoreNLP sentiment library requires a tokenizer andsentiment annotator. The tokenizer splits any input strings into sentences with the ’ssplit’annotator followed by splitting into lexical tokens by the ’tokenize’ annotator. They can thenbe passed into the sentiment annotator. This first gets annotated by the ’parse’ annotator toproduces the syntax tree, followed by the ’sentiment’ annotator which uses the technologiesdescribed in the earlier chapter to generate a five dimensional sentiment analysis for the tree.The listing below describes the process in which the server initialises these annotators for theuse in response to a request.

The server is required to respond to the request in Fig 3.4 with a "Content-type: applica-tion/json" response in Fig 3.5.

As shown in Fig 3.6, to provide this response, the endpoint receives the ’lines’ parameterfrom the request via the ’@RequestParam’ Annotation, bound to ’lines’. Each line is taken andpassed through each of the CoreNLP annotators, calculating a weighted average of the resultantsentiment vector for each sentence in the line and the average is calculated. Only one sentenceshould be present, but in the implementation in Section 3.3 it is useful, as the client requires anoverall sentiment for a multiple sentences. However the more sentences that are averaged, theless the sentiment can be relied on so this functionality will be used sparingly and only whereappropriate. The server responds as defined in the following snippit, i.e. the line followed by anumber between 0 and 1. 1 meaning positive and 0 meaning negative.

Page 29: hardback

3.1 The Stanford CoreNLP Java Library 15

package server;

import org.springframework.stereotype.Controller;import org.springframework.web.bind.annotation.RequestMapping;import org.springframework.web.bind.annotation.RequestParam;import org.springframework.web.bind.annotation.ResponseBody;

import java.util.HashMap;import java.util.List;import java.util.Properties;import java.io.*;

import org.ejml.simple.SimpleMatrix;

import edu.stanford.nlp.sentiment.*;import edu.stanford.nlp.ling.CoreAnnotations;import edu.stanford.nlp.ling.Sentence;import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations;import edu.stanford.nlp.pipeline.Annotation;import edu.stanford.nlp.pipeline.StanfordCoreNLP;import edu.stanford.nlp.trees.Tree;import edu.stanford.nlp.util.CoreMap;

Fig. 3.2 Server library imports

Page 30: hardback

16 Implementation

@Controllerpublic class SentimentController {

StanfordCoreNLP tokenizer;StanfordCoreNLP pipeline;

public SentimentController() {

Properties tokenizerProps = new Properties();

tokenizerProps.setProperty("annotators", "tokenize, ssplit");

this.tokenizer = new StanfordCoreNLP(tokenizerProps);

Properties pipelineProps = new Properties();

pipelineProps.setProperty("annotators", "parse, sentiment");pipelineProps.setProperty("ssplit.isOneSentence", "true");pipelineProps.setProperty("enforceRequirements", "false");

this.pipeline = new StanfordCoreNLP(pipelineProps);}

Fig. 3.3 Server initialisation code

GET /sentiment?lines=what+a+bad+film&lines=actually+I+really+like+it

Fig. 3.4 Required API endpoint

{"0":{ "sentiment":0.2580948040808713,

"line":"what a bad film" },"1":{ "sentiment":0.6893179757900284,

"line":"actually I really like it" }}

Fig. 3.5 Sentiment Server Response

Page 31: hardback

3.1 The Stanford CoreNLP Java Library 17

@RequestMapping("/sentiment")public @ResponseBody HashMap<Integer,HashMap<String,Object>>

sentiment(@RequestParam(value="lines", required=true) List<String> lines,Model model ) {

HashMap<Integer,HashMap<String,Object>> response =new HashMap<Integer,HashMap<String,Object>>();

for(int i = 0; i < lines.size(); i++) {response.put(i,new HashMap<String,Object>());

Annotation annotation = tokenizer.process(lines.get(i));pipeline.annotate(annotation);double sentiment = 0;int sentenceCount =

annotation.get(CoreAnnotations.SentencesAnnotation.class).size();

for( int j = 0; j < sentenceCount; j++ ) {CoreMap sentence =

annotation.get(CoreAnnotations.SentencesAnnotation.class).get(j);

Tree tree =sentence.get(SentimentCoreAnnotations.AnnotatedTree.class);

SimpleMatrix vector =RNNCoreAnnotations.getPredictions(tree);

sentiment += (vector.get(1)*0.25+vector.get(2)*0.5 +vector.get(3)*0.75+vector.get(4))/(double)sentenceCount;

}response.get(i).put("line",lines.get(i));response.get(i).put("sentiment",sentiment);

}return response;

}}

Fig. 3.6 The sentiment server endpoint’s code

Page 32: hardback

18 Implementation

Fig. 3.7 Firefox extension that highlights sentiment out of paragraphs of text.

3.1.2 Deployment

A first attempt at deployment was to use a free instance on Heroku [14]. Heroku is a CloudPlatform as a Service that supports a wide variety of languages and configurations, it alsoallows for easy deploys via a git [11] from the command line and it is also free for limiteduse. This option is unavailable as the Stanford CoreNLP models are over 300MB in size and300MB is the maximum allowed size on Heroku.

An alternative is to deploy on an Amazon EC2 Server [1]. An Ubuntu server is operatingon AWS and with code pulled via git. The server is made available at a static IP addressfor responding to requests. The domain conorbrady.com is used - whose DNS records aremanaged by Cloundflare [6] - to create a domain name at stanford-nlp.conorbrady.comand using a DNS A record, it points to the EC2 server. An endpoint as described in Fig 3.4 ismade available on this server.

3.2 Sentiment Firefox Extension

The first build using the previously mentioned server is a Firefox extension, [10] this extensionruns in a Firefox browser and highlights positive sentences as green and negative sentences asred, as shown in Fig. 3.7.

This uses content scripts, [8] these are provided to the browser from an extension onpageload, modifying the content of the page.

Page 33: hardback

3.2 Sentiment Firefox Extension 19

/* <p> elements inside <article> elements */article p/* <p> elements inside elements whose class or id

contain article or content as a sub string */[class*=content] p[id*=content] p[class*=article] p[id*=article] p/* <p class="tweet-text"> elements, for twitter */p.tweet-text/* <p> elements inside elements with class="body",

for the webpage used in the results section */.body p

Fig. 3.8 Firefox extension content selectors

3.2.1 Selectors

Looking for consistency in websites is impossible, every website has their structure andnomenclature, and it is at their discretion. So there is no way to pick off relevant text than tocompile a list of selectors [16]. Selectors are the means with which Javascript can referenceobjects in the HTML of a webpage. The comments above the selectors in Fig 3.8 denote whatelements they will select in the HTML.

This is an incomplete list, but there is no complete list. If it is too general, unnecessaryinformation will be sent to the server for analysis, wasting resources and slowing down analready slow, processing intensive process. jQuery wildcard selectors are employed to increasethe system’s robustness, the benefits are documented in the results chapter. Elements are thenremoved that contain other: paragraphs, textareas and scripts, as these cause issues and are notrequired by the extension to provide its functionality.

3.2.2 Sentence Boundaries

Once the elements are selected, the sentences are extracted from the HTML. A record is keptof where the sentences stars and end so it can be reconstructed on reinsertion. This is achievedby gathering a list of sentence markers. These are markers that are acquired by testing theparagraph for two regular expressions [25] shown in Fig 3.9.

The first regex finds possible sentence boundaries. As regex look-behinds are unsupportedin Javascript, a second pass has to be performed once a candidate has been identified. Thesecond pass tests for strings such as ’i.e.’, ’e.g.’, ’Mr’, ’Dr’ and so forth. Upon a match of the

Page 34: hardback

20 Implementation

/([.?!])\s*(?=[A-Z]|<)/g/(?:\w\.\w)|(?:[A-Z][a-z])/g

Fig. 3.9 Sentence boundary detection regexes

Access-Control-Allow-Origin: *Access-Control-Allow-Methods: GETAccess-Control-Max-Age: 3600Access-Control-Allow-Headers: x-requested-with

Fig. 3.10 Headers required to enable Cross-Origin Resource Sharing

second regex the candidate is rejected. The start and end (0 and length) are also added to thesentence markers as they will fail to match the first regex but are sentence markers.

3.2.3 Server Interaction

These acquired, the paragraph is substringed by the sentence markers and all HTML tags areremoved from the text, leaving only the text with the sentiment information.

The sentences are then sent to the server via an HTTP GET request. Each sentence is passedas a ’lines’ GET parameter in the request to the server.

When a HTTP request is made from a browser via Javascript, it requires that either thedomain of the server responding to the request be the same as the current website domain, orthe server receiving the request must support Cross Origin Resource Sharing or CORS [9].The server has to be modified to send the appropriate headers, as it would be impossible toguarantee the domain of the source of the request. The plugin is running in the browser andcould be applied to virtually any site. The response headers in Fig 3.10 are added to the server’sresponse and the browser now allows the responses from the server to pass its security.

3.2.4 Colouring

The sentence markers are used to deliminate <span> elements, elements which have no impacton the webpage apart from providing the facility to apply custom CSS to a precise section.These are given inline styling with a colour value computed by the function in Fig 3.11.

This results in a black colour for 0.5, indicating a neutral sentiment, becoming green at1.0 and red at 0.0, indicating these extrema, with a gradient between the three values. It isimportant leave the functionality of the page unaffected as changing any <a> tags could render

Page 35: hardback

3.3 Sentiment Rain 21

function getRGB(sentiment) {var red = Math.floor(

Math.max(0, -510 * sentiment + 255)).toString(16);var green = Math.floor(

Math.max(0, 510 * sentiment - 255)).toString(16);return ’#’ + ( red.length == 2 ? red : "0" + red )

+ ( green.length == 2 ? green : "0" + green ) + ’00’;}

Fig. 3.11 Function for calculating the colour from a sentiment value

the site unusable. The insertion the <span> elements ensure the safety of the site’s functionality,as all other content remained unchanged.

3.2.5 HTTPS

This will not work for sites using HTTPS [26]. The rules regarding CORS state that any requestmust be made on the same protocol as the current browser location. This means websites likeTwitter that exclusively use HTTPS require HTTPS to be set up on the endpoint to work onthose sites.

Cloudflare offer a service that allows the proxying of traffic through their server, offering aHTTPS connection to the client but connecting to the sentiment server via HTTP. This meansconfiguration of HTTPS or purchasing a certificate from a Certificate Authority is unnecessary.Cloundflare is configured to offer the service and the extension is modified to make the requestvia HTTPS when on a secure website, such as Twitter. (Fig. 3.12)

3.2.6 Overview Chart

To provide an overview of the sentiment of the page, HighCharts [15] is used. Highcharts is aJavascript library that allows for easy creation of simple charts. As each sentence is processedon the page it increments the respective bucket on the histogram defined by the sentiment value.This creates a histogram reflecting the number of sentences of each sentiment.

3.3 Sentiment Rain

This section describes a separate use of the sentiment analysis engine, to map Tweets over thecourse of a time frame, specifically the Dublin Marathon 2014 (Fig. 3.13) then live over SanFransisco.

Page 36: hardback

22 Implementation

Fig. 3.12 Plugin running on Twitter over HTTPS

Fig. 3.13 Sentiment Rain scenario showing sentiment of Tweets during the Dublin Marathon in2014

Page 37: hardback

3.3 Sentiment Rain 23

http://graisearch.scss.tcd.ie/query/Graisearch/sql/:querystring

Fig. 3.14 GRAISearch database endpoint

3.3.1 The TCD GRAISearch Dataset

A database is made available on a read-only basis from the department of statistics in TrinityCollege. This database is exposed on the endpoint in Fig 3.14, once supplied with validcredentials using HTTP Basic Auth. HTTP Basic Auth is a simple form of authorisation wherea username and password combination is encoded into a HTTP header. As this is encoded andnot encrypted, this is potentially insecure, a HTTPS connection would resolve this problembut at time of writing is unavailable on the GRAISearch server, which returns a ’503 - ServiceUnavailable’ when requested with a HTTPS protocol.

The dataset contains numerous geocoded, timestamped Tweets collected during the DublinMarathon. By leveraging this dataset with the sentiment analysis server, an interactive experi-ence of Twitter activity during the day of the Marathon is created.

3.3.2 Server Stack

The requirements of the server are minimal. The majority of the processing is done client sidewith minimal business logic occuring on the server. It serves to go between the client, theGRAISearch dataset and the sentiment server. No database is required as the data is streamedfrom the GRAISearch database. All that is required is a thin layer to serve the page and toprovide some endpoints for the Javascript to get information it needed for the visualisation (theTweet objects).

The Sinatra Ruby Mirco-framework [30] is an appropriate choice, it is lightweight and quickto get up and running. It also defaults to using productive technologies such as Coffeescript(a terse expressive language that compiles to Javascript for the browser), SASS and HAML(similar technologies relating to CSS and HTML respectively). A comparison with a largetraditional MVC framework such as Ruby on Rails reveals a cost of considerable set up timefor unneeded features.

Heroku is used for hosting, as this time the server has sufficiently small a footprint and ismore typical of what Heroku is generally used for. This allows for a git push to Heroku from thecommand line to update the server easily. They also provide the option to create a subdomain ofherokuapp.com that resolve to the server’s IP address. sentiment-rain.herokuapp.com ischosen and CNAME DNS record is set up on Cloudflare to point from

Page 38: hardback

24 Implementation

{"id": "526661075383894016","lat": 53.72362511,"lon": -6.33728538,"text": "Best of luck to our online editor @JaneLundon running the

@dublinmarathon today. We’re so proud of you Jane! @Elverys ","link": "http://t.co/RfAKcz8FsF","created_at": "1414400759000","sentiment": 0.6815987306407396

}

Fig. 3.15 Sample required /scenario_tweets resource

sentiment-rain.conorbrady.com to to the Heroku supplied domain. The server is nowavailable at http://sentiment-rain.conorbrady.com/

3.3.3 Map

A powerful Javascript browser mapping library is needed with high customisability. GoogleMaps was the original path of research, but Mapbox [19] offers more options for customisingthe mapview and drawing shapes over it. Mapbox is built on top of leaflet - an expressive mapdrawing library, it is the choice of Foursquare, Pinterest and Evernote to name a few and it isfree to use for the purposes of this project.

3.3.4 Scenario-Tweets Resource

A resource is provided at /scenario_tweets?since=:since&limit=:limitThis returns a number (:limit) of Tweets since a certain timestamp (:since) in a JSON

format, an example is shown in Fig 3.15.To implement this endpoint, the GRAISearch dataset must be queried on the URL described

in Fig 3.14.

Requesting:

• The ID as defined by Twitter

• The coordinates of tweet

• The text of the tweet

Page 39: hardback

3.3 Sentiment Rain 25

Fig. 3.16 Sentiment Rain architecture

• The time the tweet was created at

Filtered by:

• Only English tweets

• That had no null fields of those requested

• Created since the time defined on the request

• Limited by the limit on the request

Upon the GRAISearch server’s response, URLs have to be parsed out of all the textresponses to get the raw tweet text. All the responses are then encoded into a HTTP GETrequest and sent to the server for analysis and the results are filtered back into the tweet objectsfor returning to the client. The architecture of this system is shown in Fig. 3.16.

3.3.5 Javascript Visualisation

To visualise the data, a clock is initialised at 8:00am on the morning of the Dublin Marathon2014. The clock progresses at 5 minutes of simulated time per second of real time and ticks 12times per second real time.

Page 40: hardback

26 Implementation

Fig. 3.17 Tweet selected by clicking a circle

On each tick it messages the view and the data source separately and allows them to carryout their appropriate action, depending on their state.

The data source works simply, if it is not already waiting on a network request to returntweets, it checks the current time against the latest tweet it already received, if this reveals thatthe visualisation will run out of new tweets in 6 seconds of real time, it will asynchronouslyrequest the next batch of tweets from the server and feed it into the view.

The view visualises the tweets it is aware of at the time. It is the data source’s job to feedit with tweets to visualise. When the view receives a tweet it wraps it in a TweetView object.It is these objects that represent the circles on the map and any accompanying interactivity.Upon creation, a click listener is attached to the circle that produces the tweet overlaid on themap when a mouse click event occurs on it, as shown in Fig. 3.17. This is achieved by usingTwitter’s Javascript SDK and passing the Tweet ID to a call along with a webpage elementin which to render it. This element is one with id ’frame’ and exists for the sole purpose ofcontaining these tweets.

On each clock tick, the view passes the timestamp to each TweetView it also removes anyTweetViews from its set that have destroyed themselves. This destruction occurs when theydetect they should no longer be visible and will invisible in the future.

The actual TweetView objects shrink their circle’s size and modulate their border radiuswith respect to the supplied timestamp and the time they were created at on Twitter. After acertain point in time after they are created, they will have no size and call a destroy method on

Page 41: hardback

3.3 Sentiment Rain 27

Fig. 3.18 Sound synthesis schematic for audio creation parametrised by sentiment and location

themselves that instructs the parent view to stop messaging them and let the garbage collectorclean them up as described previously.

3.3.6 Sound

Given the visual nature of this project it is appropriate to furnish it with a complimentarysoundscape. For this the Audiolet Javascript library [2] is appropriate. This allows sample-altering and frequency-producing nodes to be connected in a network to produce sound effectsand music synthesis.

After some experimentation and research the configuration shown in Fig. 3.18 was produced.This generates a low tone for negative sentiment and a high tone for positive sentiment. Thesetones are selected from a C Minor scale with the 2nd, 4th and 7th removed as these are notdominant notes of the scale and as the notes are selected with uniform randomness, it is betterthat the selection be of more dominant notes to reinforce the scale.

Page 42: hardback

28 Implementation

Fig. 3.19 An example of a ASDR envelope

From there it is connected to a low pass filter that is modulated by a square wave. A lowpass filter filters out high frequencies and allows lower frequencies to pass, when the cut offfrequency is varied or modulated, this produces a rhythmic effect. The modulating squarewave’s frequency is controlled by the sentiment, giving a slow modulation for negative and ahigh modulation for positive, between 1Hz and 16Hz, this modulates the cutoff of the low passfilter between 5kHz and 9kHz resulting in a rhythmic effect.

A device known as an ADSR (Attack, Decay, Sustain, Release) envelope produces thesignal shown in Fig 3.19. This is triggered the moment the tweet is tweeted, then summedwith another modulation square wave that is off the frequency of the modulator of the lowpass filter by enough to create a rhythm between the two. The summation of the ADSR andformer modulation square wave produces a signal that begins with the transient of the ASDRcombined with the choppiness of the square wave, but after the release continues to hold withjust the square.

This summed signal is combined in a multiplier with the previous signal from the lowpass filter, this acts on the output of the low pass filter as an amplitude control, to create theenveloped sound with a transient followed by a rhythmic amplitude at odds with the alreadypresent modulated low pass filter.

The final 2 stages give spatial awareness to the sound by first attenuating it with respect toits distance to the map centre, and panning it with respect to its x position on the map.

Page 43: hardback

3.3 Sentiment Rain 29

Fig. 3.20 Live view of tweets over San Francisco

3.3.7 Live

As a final experiment a live stream of tweets from Twitter is connected, to showcase the tweetsover San Fransisco in real time. This is fed directly from Twitter’s API as opposed to thedataset made available from TCD GRAISearch. This means Twitter’s API policies such as ratelimiting and requesting based on location had to be adhered to. After that the approach andtechnologies are similar and it is a success [29].

Page 44: hardback
Page 45: hardback

Chapter 4

Experimental Results

4.1 Stanford Sentiment Server

This server performs as expected with roughly 200ms round trip time per sentence submitted inthe request. Measuring the performance of sentiment analysis, the server performs as well toStanford NLP department’s live demonstration [24] in all comparisons. In future it is expectedthat Stanford’s demonstration will perform better as it has access to the most up to date training.No replication or load balancing is in play and this implementation will not scale to largeamounts of users requesting simultaneously, but as no shared state is held on the server the taskof replication and load balancing is straight forward when required.

4.2 Firefox Extension

The Firefox extension has 40% recall of relevant content with the basic Javascript selectorarticle p. By augmenting the selectors with jQuery wildcard selectors (as explained inSection 3.2.1) the recall rises to 88% of relevant content. These selectors search anywhere inthe class or id attribute of elements in the webpage for the string of either "article" or "content",instead of trying to match the whole name. Due to the unpredictable nature in which websitesare constructed this works exceptionally well, as demonstrated in Fig. 4.1.

The sentence boundary splitting appears robust, with 95% precision of detection of sentenceboundaries across Fig 4.2, Fig 4.3 and Fig 4.4. The approach presented has independently beenestimated at 95% effective [22].

Each paragraph of content requires around three seconds to process on an idle server. Thisprocessing is carried out in a serial manner, therefore the processing time of the page is a linearfunction of the number of paragraphs, with paragraph length taken into account. As mentioned

Page 46: hardback

32 Experimental Results

Website Selectorhttp://www.vulture.com/ article phttp://www.theglobeandmail.com/ article phttp://www.detroitnews.com/ article phttp://theadvocate.com/ article phttp://wegotthiscovered.com/ article phttp://www.newyorker.com/ article phttp://qctimes.com/ article phttp://www.rollingstone.com/ article phttp://www.dailynews.com/ article phttp://www.rogerebert.com/ article phttp://www.thestar.com/ [class*=article] phttp://www.abc.net.au/ [class*=article] phttp://www.forbes.com/ [class*=article] phttp://www.reviewjournal.com/ [class*=content] phttp://www.nj.com/ [class*=content] phttp://thepopcornjunkie.com/ [class*=content] phttp://baretnewswire.org/ [class*=content] phttp://www.tvinsider.com/ [class*=content] phttp://filmink.com.au/ [id*=article] phttp://www.vox.com/ [id*=article] phttp://theyoungfolks.com/ [id*=content] phttp://screenrant.com/ [id*=content] phttp://www.cityweekly.net/ failhttp://www.reelingreviews.com/ failhttp://www.ericdsnider.com/ fail

Fig. 4.1 Random sample of websites to test effectiveness of content selection from extension

Page 47: hardback

4.2 Firefox Extension 33

Fig. 4.2 Short movie review [5], colour coded based on sentiment

previously due to limitations on the server this does not scale well and after a number of usersperformance will become sluggish.

Fig. 4.2 to Fig. 4.5 show the effect the of the extension on a series of short, opinionatedmovie reviews. Due to the libraries training on movie reviews, this is a perfect demonstration ofits abilities. The reader is encouraged to evaluate the effectiveness for themselves. The Figuresshow parallels between sentiment and colour coding. Also it can be seen in these excerpts, andthat of Fig 4.6, that no functionality has been affected in the page, all links remain clickable.

The chart overview offers little insight into the pages sentiment. Simply counting thenumber of positive and negative sentences reveals little. If one sentence can carry much moreweight than the others, it should represent more value on the histogram. Without said weightsthis representation is impossible. Also hidden text on the page will contribute to the histogram’sstate. While colouring the text this does not present a problem, as the user cannot see it to beginwith it. The problem this chart attempts to solve is complex and likely an ongoing researcheffort in Stanford’s NLP department. That is attempts to decipher the importance and weight ofa sentence relative to its surrounding content.

Fig 4.6 shows an article on Wikipedia, a website with little to no opinion, apparently hassentiment information. It is the authors conviction this is due to two forms of sentiment, thefirst being sentiment opinion of the writer which is evident in the movie reviews, and the

Page 48: hardback

34 Experimental Results

Fig. 4.3 Short movie review [5], colour coded based on sentiment

Fig. 4.4 Short movie review [5], colour coded based on sentiment

Fig. 4.5 Short movie review [5], colour coded based on sentiment

Page 49: hardback

4.2 Firefox Extension 35

Fig. 4.6 Excerpt from the Wikipedia page of Dublin, Ireland

Page 50: hardback

36 Experimental Results

second being sentiment opinion of the reader as apparent in this article. Take the sentenceof "In response to Strongbow’s successful invasion ... pronounced himself Lord of Ireland".This is a subjective sentiment, some people would think it positive and some negative but it isunrepresentative of an opinion. This is likely a failure in the training of the dataset with biasand personal outlook creeping into the model. It could be due to an under-trained network, butevidence suggests the contrary as this theme is consistent across Wikipedia.

The measure in which this library achieved 85% correct classification ignored the neutralclass [31]. Evidenced by Fig 4.6 it could be argued that a neutral class is as important aspositive and negative and should be included in future state of the art benchmarks.

A packaged extension can be be found on the CD accompanying this report in the folderentitled "demonstration".

4.3 Sentiment Rain

Sentiment Rain is a success on the desktop browsers, Safari and Chrome on OSX. The framerate is smooth and the colours and tones reflect the sentiment. It struggles on Firefox on OSXdue to the manner in which that browser allocates its threads. In Firefox there is only onethread available to the Javascript engine per tab, when a network call is made it disrupts theuser interface’s processing and this manifests as stuttering in the browser. Safari and Chromehandle this better with the networking not interfering with the user interface. Internet Explorerremains untested for compatibility.

As proof of concept this is a sufficient result. A screenshot is provided in Fig 4.7. Asa screenshot is a poor demonstration of this implementation, the reader is encouraged toinvestigate the live demos in the Chrome browser, [12] available at [28] and [29]. Also avideo demonstration is included on the CD accompanying this report in the folder entitled"demonstration".

Page 51: hardback

4.3 Sentiment Rain 37

Fig. 4.7 Screenshot of Sentiment Rain over the Dublin Marathon 2014

Page 52: hardback
Page 53: hardback

Chapter 5

Future Work

5.1 Sentiment Analysis and Semantic Vectors

The research in this project is about learning semantic vector representations and how to gainuseful information from them. As the semantic vectors’ axes are undefined, a transformationmust be performed to known axes to gain insight into the information contained within thespace. Sentiment is a poorly defined concept as it is not always objectively clear what thesentiment of a sentence is, for example "I love when people die in war" gets a sentiment ratingof 0.54. This sentence contains sentiment, but in multiple, separate dimensions. When thisspace gets projected to a one dimensional space all the sentiment inherent in the sentence islost.

If clear objective axes are defined, and a projection is learnt to them, the network couldyield a more useful insight into the semantic meaning of the vector representations. If amultidimensional sentiment model could be proposed such as the sentiment of the writer onone axis and the perceived sentiment of the reader on the other, the network may be trained toproduce a more robust, context free sentiment analyser than the ones that have been seen sofar. A model such as this is hard to define, the proposal above may not be a valid model, as thesentiment of a reader is a subjective concept and thus may require a different transformationbased on the political and moral viewpoint of the reader.

It would be interesting to investigate what other meaningful projections could be achievedfrom the semantic vectors. A model where sentiment is decomposed into two dimensionsmay just be the beginning. The field of sentiment analysis could be generalised into oneof semantic analysis with multiple dimensions revealing their own information along withsentiment. Multiple transformation matrices could encode different viewpoints and encodepersonal biases into the vectors and allow for insight into the personal semantic information areader would receive from a piece of text.

Page 54: hardback

40 Future Work

5.2 Stanford Sentiment Server

The sentiment server could be spread across multiple machines with caches and load balancersto enable scalability. This system could be scaled as there would be no interdependence ofstored data and therefore could load balance with ease. As this was merely a proof of concept,the non scalable server was sufficient for these purposes.

The biggest improvement in scalability would be sourced with the removal of the serverentirely and replace with a similar library written in Javascript. This would allow much cheaperscalability as no dedicated server would be required for sentiment processing and each userwould do the sentiment analysis on their own machine. This could be trained on Stanford’ssentiment treebank [31] to achieve similar results.

5.3 Firefox Extension

Moving sentence splitting to the server would prove much more reliable as the CoreNLPcontains functionality to split sentences more robustly than anything Javascript libraries canoffer. This is too significant an architecture redesign to undertake at this time but should beconsidered early on.

The other issue concerns reliable extraction of pertinent text. This problem has no clearsolution as each webpage has potentially a different structure to any seen before. One possiblesolution would be to provide a facility to click paragraphs that are of interest and derive aselector to parse it, and update the selector list for future.

Increasing interactivity in the extension, allowing users to correct mislabelled sentences toaugment the training set in future would prove invaluable. As this is a deep learning technology,the more data the network has to learn and generalise with, the more powerful it becomes.

5.4 Sentiment Rain

Caching on the backend, both in the live and scenario models would allow for large performancegains and scalability. In the live model, each request that comes into the server requests newtweets from Twitter. Given a modest number of users this would quickly exceed twitter’s ratelimits and render the system useless, until the API reopens again 15 minutes later, only tobe rapidly exceeded again. As each user is looking for the same tweets only one request isnecessary and should be cached for successive requests from clients.

The other need for caching involves both live and scenario models. The sentiment for eachtweet is calculated at time of request. This is wasteful of resources as if two users are requesting

Page 55: hardback

5.4 Sentiment Rain 41

the same tweets there is no need to analyse them twice. Online analysis is unavoidable inthe live model, but should only be analysed once and cached for future requests from otherusers. In the Dublin Marathon scenario, tweets can be pre-analysed removing the need for thesentiment server altogether, this would improve responsiveness and scalability massively.

Tweets could be preprocessed. Emojis and hashtags are unlikely to provide meaningfulinputs to the system. If they could be substituted for more clear words or removed altogether aperformance boost could be perceived.

Page 56: hardback
Page 57: hardback

Chapter 6

Conclusions

In conclusion deep learning is an effective means of determining sentiment analysis, as sen-timent is one projection on semantic meaning. The technology set out in [31] is the onlytechnology that takes the entire semantic meaning of a sentence into account before applyingthis projection. Words alone cannot determine sentiment without first taking into accountwhat they mean in the context of the sentence and the ordering they appear in. Without thiscomposition of semantic vectors further progress in sentiment analysis is unlikely.

By leveraging the flexibility of the HTTP protocol, a platform agnostic endpoint containedwithin a replicatable server is created. This allows for a scalable sentiment analysis serverinfrastructure.

This powers two novel implementations of sentiment analysis, one on any piece of textvisited by the browser, the other temporally mapping tweets across a map during a givenscenario. Implementations of this type are effective uses of this technology and warrant furtherinvestigation in the future.

Page 58: hardback
Page 59: hardback

References

[1] Amazon Web Services Elastic Cloud 2. http://aws.amazon.com/ec2/. [Online; accessed17-April-2015]. 2015.

[2] Audiolet - JavaScript library for audio synthesis and composition. http://oampo.github.io/Audiolet/. [Online; accessed 17-April-2015]. 2015.

[3] Yoshua Bengio. “Learning deep architectures for AI”. In: Foundations and trends® inMachine Learning 2.1 (2009), pp. 1–127.

[4] Yoshua Bengio et al. “A Neural Probabilistic Language Model”. In: J. Mach. Learn. Res.3 (Mar. 2003), pp. 1137–1155. ISSN: 1532-4435. URL: http://dl.acm.org/citation.cfm?id=944919.944966.

[5] Don Chartier. Short and Sweet Movie Reviews. http : / / shortandsweet . blogspot . ie/.[Online; accessed 17-April-2015]. 2015.

[6] Cloudflare. https://www.cloudflare.com/. [Online; accessed 17-April-2015]. 2015.[7] Ronan Collobert and Jason Weston. “A unified architecture for natural language pro-

cessing: Deep neural networks with multitask learning”. In: Proceedings of the 25thinternational conference on Machine learning. ACM. 2008, pp. 160–167.

[8] Content Scripts | Mozilla Developer Network. https://developer.mozilla.org/en-US/Add-ons/SDK/Guides/Content_Scripts. [Online; accessed 17-April-2015]. 2015.

[9] Cross-Origin Resource Sharing | w3.org. http://www.w3.org/TR/cors/. [Online; accessed17-April-2015]. 2015.

[10] Firefox Extensions. https://addons.mozilla.org/en-US/firefox/extensions/. [Online;accessed 17-April-2015]. 2015.

[11] Git –fast-version-control. http://git-scm.com/. [Online; accessed 17-April-2015]. 2015.[12] Google Chrome Browser. https://www.google.ie/chrome/browser/desktop/. [Online;

accessed 18-April-2015]. 2015.[13] GRAISearch. Use of Graphics Rendering and Artificial Intelligence for Improved Mobile

Search Capabilities. FP7-PEOPLE-2013-IAPP (612334) 2015-18.[14] Heroku. https://www.heroku.com/. [Online; accessed 17-April-2015]. 2015.[15] Highcharts - Interactive JavaScript charts for your webpage. http://www.highcharts.

com/. [Online; accessed 17-April-2015]. 2015.[16] jQuery Selectors. https://api.jquery.com/category/selectors/. [Online; accessed 17-April-

2015]. 2015.

Page 60: hardback

46 References

[17] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. “ImageNet Classification withDeep Convolutional Neural Networks”. In: Advances in Neural Information ProcessingSystems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1097–1105. URL:http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf.

[18] Christopher D. Manning et al. “The Stanford CoreNLP Natural Language ProcessingToolkit”. In: Proceedings of 52nd Annual Meeting of the Association for ComputationalLinguistics: System Demonstrations. 2014, pp. 55–60. URL: http://www.aclweb.org/anthology/P/P14/P14-5010.

[19] Mapbox | Design and publish beautiful maps. http : / /www.mapbox.com/. [Online;accessed 17-April-2015]. 2015.

[20] Maven. http://search.maven.org/. [Online; accessed 17-April-2015]. 2015.[21] Tomas Mikolov et al. “Efficient estimation of word representations in vector space”. In:

arXiv preprint arXiv:1301.3781 (2013).[22] John O’Neil. Doing Things with Words, Part Two: Sentence Boundary Detection. http:

//web.archive.org/web/20131103201401/http://www.attivio.com/blog/57-unified-information- access /263- doing- things- with- words- part - two- sentence- boundary-detection.html. [Online; accessed 17-April-2015]. 2008.

[23] Jordan B Pollack. “Recursive distributed representations”. In: Artificial Intelligence 46.1(1990), pp. 77–105.

[24] Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank -Live Demo. http://nlp.stanford.edu:8080/sentiment/rntnDemo.html. [Online; accessed17-April-2015]. 2015.

[25] Regular Expressions - Javascript | MDN. https://developer.mozilla.org/en/docs/Web/JavaScript/Guide/Regular_Expressions. [Online; accessed 17-April-2015]. 2015.

[26] Eric Rescorla. HTTP Over TLS. RFC 2817. RFC Editor, May 2000, pp. 1–7. URL:http://tools.ietf.org/html/rfc2818.

[27] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. “Learning internalrepresentations by error-propagation”. In: Parallel Distributed Processing: Explorationsin the Microstructure of Cognition. Volume 1. Vol. 1. 6088. MIT Press, Cambridge, MA,1986, pp. 318–362.

[28] Sentiment Rain | Dublin Marathon 2014. http://sentiment-rain.conorbrady.com/scenario.[Online; accessed 17-April-2015]. 2015.

[29] Sentiment Rain | Live Over San Fransisco. http://sentiment-rain.conorbrady.com/live.[Online; accessed 17-April-2015]. 2015.

[30] Sinatra - A Ruby Server Micro-framework. http://www.sinatrarb.com/. [Online; accessed17-April-2015]. 2015.

[31] Richard Socher et al. “Recursive deep models for semantic compositionality over asentiment treebank”. In: Proceedings of the conference on empirical methods in naturallanguage processing (EMNLP). Vol. 1631. Citeseer. 2013, p. 1642.

Page 61: hardback

References 47

[32] Richard Socher et al. “Semantic Compositionality Through Recursive Matrix-vectorSpaces”. In: Proceedings of the 2012 Joint Conference on Empirical Methods in NaturalLanguage Processing and Computational Natural Language Learning. EMNLP-CoNLL’12. Jeju Island, Korea: Association for Computational Linguistics, 2012, pp. 1201–1211.URL: http://dl.acm.org/citation.cfm?id=2390948.2391084.

[33] Richard Socher et al. “Semi-supervised Recursive Autoencoders for Predicting Sen-timent Distributions”. In: Proceedings of the Conference on Empirical Methods inNatural Language Processing. EMNLP ’11. Edinburgh, United Kingdom: Associationfor Computational Linguistics, 2011, pp. 151–161. ISBN: 978-1-937284-11-4. URL:http://dl.acm.org/citation.cfm?id=2145432.2145450.

[34] Spring Web MVC Framework. http : / / docs . spring . io / spring / docs / current / spring -framework-reference/html/mvc.html. [Online; accessed 17-April-2015]. 2015.

[35] Dong Yu and Li Deng. Automatic Speech Recognition - A Deep Learning Approach.Springer, Oct. 2014. URL: http://research.microsoft.com/apps/pubs/default.aspx?id=230891.

Page 62: hardback