14
Thirty Third International Conference on Information Systems, Orlando 2012 1 Bulls, Bears…and Birds? Studying the Correlation between Twitter Sentiment and the S&P500 Completed Research Paper Eric D. Brown [email protected] Introduction With the phenomenal growth of the Twitter Social Networking platform from the first user in 2006 to over 140 million users in 2012, there has been a great deal of interest in analyzing Twitter messages to determine if there are any actionable uses for this user generated content. One area that has received interest is in the analysis of Twitter messages for sentiment to be used as signals for investing decisions. For the purposes of this paper, sentiment can be generally defined as an attitude or opinion on a subject (Webster 2012). In addition, Sentiment Analysis can generally be defined as the use of natural language processing methods to analyze text to determine whether sentiment (or other subjective information) exists in that text (Lui 2010). Performing Sentiment Analysis on text provides an output of opinion types (e.g., positive, negative, neutral, etc) which can then be used as inputs for other analysis methods. While performing a lengthy literature review, it has become apparent that researchers are attempting to find approaches that use Twitter sentiment for predictive purposes to be used in stock market decision support (Bollen et al. 2010; Brown 2012; Sprenger et al. 2010; Wolfram 2010; Zhang et al. 2011). While the research into sentiment for predictive purposes is interesting, it appears that very little research has been conducted that compares Twitter sentiment to certain well regarded stock market sentiment measures. In this paper, a research project is described that uses data analysis techniques combined with Natural Language Processing (NLP) to collect and analyze Twitter messages, determine whether sentiment is conveyed and, if so, how well that sentiment compares to existing financial market sentiment measures, such as the American Association of Individual Investors (AAII) sentiment survey (AAII 2012). At the outset of this project, there were three main goals: 1.) To determine if data analytics can be used to automate sentiment analysis from Twitter messages; 2.) To determine whether the outcome of the analytical engine is comparable to existing survey methods and finally; 3.) To determine if there is any actionable knowledge contained within Twitter sentiment that can be used to make investing decisions. Each of the above goals was accomplished. Twitter data can be collected and analyzed using standard sentiment analysis methods. Additionally, using Bayesian Classification techniques combined with a manually generated training data set, resulted in an outcome which is correlated to existing sentiment survey methods (e.g., the AAII Sentiment Survey). Lastly, using standard statistical methods, there is very little statistical correlation between the Twitter Sentiment found within this study’s sentiment analysis methods and the closing price of S&P 500 Index. The remainder of this paper is consists of five sections. Section 1 provides a brief introduction to the idea of sentiment and previous research in the space. Section 2 describes the Twitter message collection and analysis techniques used for an automated sentiment analysis method. Section 3 provides an analysis of the outcome of the automated sentiment analysis techniques. Section 4 compares Twitter sentiment to the AAII sentiment survey and provides an analysis of whether either of these sentiment measures might be a useful input to predictive modeling. Lastly, the paper concludes with Section 5 where the outcome of this research project and future avenues for research are discussed.

Bulls, Bears and Birds

Embed Size (px)

Citation preview

Thirty Third International Conference on Information Systems, Orlando 2012 1

Bulls, Bears…and Birds? Studying the Correlation between Twitter Sentiment

and the S&P500 Completed Research Paper

Eric D. Brown

[email protected]

Introduction

With the phenomenal growth of the Twitter Social Networking platform from the first user in 2006 to over 140 million users in 2012, there has been a great deal of interest in analyzing Twitter messages to determine if there are any actionable uses for this user generated content. One area that has received interest is in the analysis of Twitter messages for sentiment to be used as signals for investing decisions.

For the purposes of this paper, sentiment can be generally defined as an attitude or opinion on a subject (Webster 2012). In addition, Sentiment Analysis can generally be defined as the use of natural language processing methods to analyze text to determine whether sentiment (or other subjective information) exists in that text (Lui 2010). Performing Sentiment Analysis on text provides an output of opinion types (e.g., positive, negative, neutral, etc) which can then be used as inputs for other analysis methods.

While performing a lengthy literature review, it has become apparent that researchers are attempting to find approaches that use Twitter sentiment for predictive purposes to be used in stock market decision support (Bollen et al. 2010; Brown 2012; Sprenger et al. 2010; Wolfram 2010; Zhang et al. 2011). While the research into sentiment for predictive purposes is interesting, it appears that very little research has been conducted that compares Twitter sentiment to certain well regarded stock market sentiment measures.

In this paper, a research project is described that uses data analysis techniques combined with Natural Language Processing (NLP) to collect and analyze Twitter messages, determine whether sentiment is conveyed and, if so, how well that sentiment compares to existing financial market sentiment measures, such as the American Association of Individual Investors (AAII) sentiment survey (AAII 2012).

At the outset of this project, there were three main goals: 1.) To determine if data analytics can be used to automate sentiment analysis from Twitter messages; 2.) To determine whether the outcome of the analytical engine is comparable to existing survey methods and finally; 3.) To determine if there is any actionable knowledge contained within Twitter sentiment that can be used to make investing decisions.

Each of the above goals was accomplished. Twitter data can be collected and analyzed using standard sentiment analysis methods. Additionally, using Bayesian Classification techniques combined with a manually generated training data set, resulted in an outcome which is correlated to existing sentiment survey methods (e.g., the AAII Sentiment Survey). Lastly, using standard statistical methods, there is very little statistical correlation between the Twitter Sentiment found within this study’s sentiment analysis methods and the closing price of S&P 500 Index.

The remainder of this paper is consists of five sections. Section 1 provides a brief introduction to the idea of sentiment and previous research in the space. Section 2 describes the Twitter message collection and analysis techniques used for an automated sentiment analysis method. Section 3 provides an analysis of the outcome of the automated sentiment analysis techniques. Section 4 compares Twitter sentiment to the AAII sentiment survey and provides an analysis of whether either of these sentiment measures might be a useful input to predictive modeling. Lastly, the paper concludes with Section 5 where the outcome of this research project and future avenues for research are discussed.

Research Track

2 Thirty Third International Conference on Information Systems, Orlando 2012

An introduction to Sentiment and the Markets

The concept of stock market sentiment is nothing new and can be dated back to its introduction by Keynes (1936) when it was popularized with the beauty contest idea in the early twentieth century. Both academics and market practitioners have developed models based around investor and market sentiment (Baker et al. 2007; Barberis et al. 1998; Otoo 1999). In the past, many sentiment measures have been gathered and calculated on a longer term time-frame using surveys of investors, brokers and other market participants (AAII 2012; NAAIM 2012). Market sentiment has historically been gathered via surveys of professional market participants (Baker et al. 2007; Barberis et al. 1998). These surveys attempt to gauge the bias of market participants in order to better understand how the market might move in the near future (Hassan et al. 2010). There has been considerable research in this space to attempt to create predictive models for investing decision support with some relative success (Baker et al. 2007; Barberis et al. 1998; Hassan et al. 2010; King 2011; Lutz 2010; Mian et al. 2010; Mitra et al. 2011; Oh et al. 2011; Otoo 1999).

With the rise in social networks and Web 2.0 platforms, many researchers have tried to determine if sentiment analysis could be applied to these platforms to understand if any actionable information could be gathered from websites, forums and other user generated content (Antweiler et al. 2004; Chua et al. 2009; Tumarkin et al. 2001b; Wysocki 1998). During a literature review in this space, there does not seem to be agreement as to whether sentiment gathered from user generated content provides any edge in making investing decisions (Antweiler et al. 2004; Chua et al. 2009; Tumarkin et al. 2001b; Zhang 2009).

In recent years, the growth and enormous reach of Twitter have led researchers to revisit previous research relating to the use of user-generated content for determining sentiment and taking action based on these sentiment measures (Brown 2012; Das et al. 2007; Zhang et al. 2011). In addition to trying to understand sentiment, there have been reports of quite accurate predictive models built upon sentiment gathered from social networks like Twitter (Bollen et al. 2010; Oh et al. 2011; Sprenger et al. 2010; Wolfram 2010; Zhang et al. 2011).

With these recent research projects in mind, Twitter messages were collected and analyzed to determine whether sentiment gathered from these messages can be used to understand stock market sentiment. In addition, this sentiment is then compared to a more manual survey-based sentiment measures found in surveys like the AAII Sentiment Survey.

Twitter Message Collection and Analysis

Twitter, one of the most popular Social Network websites today, has grown from its founding in 2006 to over 140 million users and handles over 340 million Twitter messages sent per day (Twitter 2012). This rapid growth and quick adoption by users has allowed Twitter to move into the general lexicon of modern society. The premise behind Twitter is to provide users with the ability to communicate with other Twitter users using messages that are less than 140 characters in length.

While the use of the Twitter service is quite simple and straightforward, many users have built complex relationships, communities and businesses on top of the Twitter service. One community, StockTwits, which has taken advantage of the Twitter platform, is a community of investors and traders that use Twitter to share investing ideas, trade outcomes and other pertinent information.

In 2008, a few entrepreneurs, sensing the value of Twitter for investors, started a company called StockTwits with the goal of creating a social-driven reporting and communication platform for the financial and stock market community and has grown to over 150,000 users (Stocktwits.com 2012). The StockTwits platform is built to run on top of the Twitter platform, which allows any user on StockTwits.com to see content shared on Twitter and vice versa. This integration with Twitter, the relatively large user base and the development of the StockTwits platform has created a wide-spread community built around the stock market.

With a community like the StockTwits community sharing information about the market and that information being made available to others via each user’s publically available Twitter stream, these messages can be stored and analyzed to determine if actionable information is available.

Bulls, Bears…and Birds?

Thirty Third International Conference on Information Systems, Orlando 2012 3

Collecting Twitter Messages

Twitter provides access to the Twitter platform via an Application Programming Interface (API) which provides developers with the ability to tap into the Twitter ecosystem (Twitter 2011b). In order to capture Twitter messages in real-time, Streaming API methods are used to read and store Twitter messages containing a keyword provided by the developer’s application.

For the purposes of this research, a selection of stock market symbols from the Standard & Poor 500 Index (S&P500) will be used as the keywords to be tracked using the Twitter Streaming API. In an attempt to limit the number of Twitter messages to those messages that are specifically mentioning a stock or company, the nomenclature made popular by StockTwits will be used. This nomenclature uses a dollar symbol ($) pre-pended to the stock symbol. As an example, the S&P500 is generally referred to by the symbol ‘SPX’ by most investors and, using the StockTwits nomenclature, the symbol that would be tracked via the Twitter API would be “$SPX”(Stocktwits 2011).

Due to Twitter API limitations and hardware constraints at the beginning of the research project, only a subset of the symbols that make up the S&P500 were able to be tracked. Three sectors were selected for tracking purposes: the Energy Sector, the Consumer Staples Sector and the Technology Sector. In addition to these three sectors, the S&P500 Index symbol is tracked along with select S&P500 Exchange Traded Funds (ETF’s). These sectors where chosen to ensure the total number of keywords tracked were below the Twitter API limitation on the number of keywords that can be tracked at one time by an account. A full list of symbols / keywords tracked using the Twitter API is provided in Table 1.

Table 1. Listing of Symbols tracked on Twitter

Energy Sector Consumer Staples Technology Sector General

$XLE $NBL $XLP $AVP $XLK $CTSH $SNDK $WIN $SPX

$XOM $MEE $PG $SLE $AAPL $GLW $WU $HRS $SPY

$CVX $VLO $PM $MJN $MSFT $BRCM $APH $FTR $XLY

$SLB $CAM $WMT $CCE $IBM $YHOO $XRX $CSC $XLF

$COP $MUR $KO $TAP $T $INTU $FISV $JBL $XLV

$OXY $FTI $KFT $CLX $GOOG $CCI $WDC $LSI $XLI

$APA $CNX $MO $CAG $ORCL $DELL $JNPR $EA $XLB

$HAL $SWN $CVS $HSY $VZ $ADBE $XLNX $TSS $XLU

$APC $DNR $PEP $EL $INTC $CTXS $CA $MOLX

$MRO $NBR $CL $SWY $QCOM $AMAT $NVDA $TER

$DVN $RDC $WAG $MKC $CSCO $S $KLAC $JDSU

$BHI $RRC $COST $BFB $V $TEL $VRSN $FLIR

$NOV $COG $KMB $TSN $EMC $MSI $FIS $PCS

$EOG $TSO $GIS $WFMI $EBAY $TDC $LLTC $SAI

$HES $SUN $ADM $CPB $MA $SYMC $ADSK $AMD

$CHK $NE $HNZ $SJM $ACN $NTAP $FFIV $LXK

$WMB $NFX $SYY $DPS $TXN $ALTR $AKAM $FSLR

$PXD $DO $KR $STZ $ADP $ADI $MCHP

$BTU $EQT $K $HRL $CTL $RHT $BMC

Research Track

4 Thirty Third International Conference on Information Systems, Orlando 2012

The Twitter API provides methods that allow any Twitter message containing a tracked keyword within the message to be routed to whatever method the developer chooses to use for storage or analysis (Twitter 2011a). Using the PHP programming language, a Twitter message collection engine was created that connected to the Twitter Streaming API and listened for any of the symbols listed in Table 1. These messages and their relevant metadata (e.g., message, date sent, sending user, etc) were then stored in a MySQL database.

The data collection process using this PHP collection engine has been ongoing since May 1 2011 and has resulted in over 1.9 million twitter messages captured and stored. While every attempt has been made to ensure this collection engine is available round-the-clock to capture available Twitter messages, there have been instances where the Twitter API was unavailable or the system has been down. While there has been downtime, there has not been significant downtime on the collection engine since November 1 2011. With this in mind, this is the date that was used as the starting point for analysis.

Analyzing Twitter Messages

In order to extract actionable and useful knowledge from the collected Twitter messages, each message must be analyzed for sentiment. For the purposes of this research, an analysis of the sentiment contained within the collected messages is performed.

Sentiment analysis, also called opinion mining in the literature, has been used in one form or another for years but the automated use of natural language processing and computational linguistics has been growing in popularity in recent years with researchers studying sentiment analysis techniques and the application of those techniques to various domains including movie reviews (Thet et al. 2009), general opinion mining (Pak et al. 2010; Pang et al. 2008a) and attempts to predict the movement of the stock market (Bollen et al. 2010; Sprenger et al. 2010; Tumarkin et al. 2001a; Zhang et al. 2011).

It has been shown that Twitter messages can be analyzed for sentiment using various analysis techniques (Bifet et al. 2010; Go et al. 2009; Pak et al. 2010; Thelwall et al. 2011). In addition, further research has reported that sentiment analysis of Twitter messages does provide value for stock market decisions (Bollen et al. 2010; Sprenger et al. 2010; Wolfram 2010; Zhang et al. 2011).

For this project, the use of the Naïve Bayesian Classification algorithm is used. The Bayesian Classification method is a well-researched sentiment analysis technique that provides an efficient and accurate method for analyzing text for sentiment (Durant et al. 2006; Frank et al. 2006; Pang et al. 2008b; Pang et al. 2002a; Sahami et al. 1998).

While a detailed description of the Bayesian Classification method is outside the scope of this paper, it is worth highlighting that this approach uses probabilities to assign a given class to text. In a Bayesian Analysis approach to sentiment analysis, a sentence is broken into words, a probability value is assigned to each word and then summed up to provide an overall sentence probability (Lin et al. 2009). This probability is then used to assign a sentiment category to the sentence. For the purposes of this study, the sentiment category’s that will be used are based on market nomenclature for positive and negative sentiment. These four classes of sentiment are:

Bullish for those messages that denote a positive sentiment.

Bearish for those messages that denote a negative sentiment.

Neutral for those messages that do not convey any discernible sentiment.

Spam for those messages that aren’t delivering market information but more related to internet marketing.

$SE $QEP $RAI $SVU $HPQ $PAYX $LRCX

$EP $HP $LO $DF $CRM $STX $MU

Bulls, Bears…and Birds?

Thirty Third International Conference on Information Systems, Orlando 2012 5

It is difficult to describe the origination of the terms ‘bullish’ and ‘bearish’ as there are numerous descriptions of how these terms came into use. They appear to have entered the lexicon of every-day investors from their use in the popular Dow Theory used in the early to mid-twentieth century (Brown et al. 1998).

While the Bayesian Classification method is relatively easy to implement from a technical perspective using any modern programming language, an existing implementation was selected in order to reduce development time and inaccuracies that might arise from improper implementation. After reviewing open source software and classification tools, the Python programming language was selected as the baseline platform for analysis along with the Natural Language Toolkit (NLTK) for natural language processing (Loper et al. 2002; Rossum et al. 1991). The Python programming language is well known and often cited within the academic community and is used throughout many different areas of research, including the bioinformatics, computer science, finance and artificial intelligence areas (Antiga et al. 2008; Cai et al. 2006; Chapman et al. 2004; Rossum et al. 1991; Xing et al. 2005). In addition, the NLTK module is a well-regarded and is an often cited natural language processing module available for python (Curran 2003; Hearst 2005; Liddy et al. 2005; Lu et al. 2009; Sangeetha et al. 2012).

In order to use the Bayesian Analysis approach, a training dataset must be created before sentiment analysis is undertaken. The creation of the training dataset is a time consuming and manual process whereby a random sampling of data is selected and codified into sentiment categories (Pang et al. 2002b). For the purposes of this study, a training dataset was created by taking a random sample of the captured Twitter message. Each message in the training dataset was manually assigned a value of Bullish, Bearish or Neutral. The training dataset consisted of 1,000 messages in each of the Bullish, Bearish, Neutral and Spam categories. These manually codified messages were then used as the input training set during the automated sentiment analysis of the remaining Twitter messages. A sampling of the training dataset is shown in Table 2.

Table 2. Training Dataset Samples

Bullish

consumer staples outperforming the broader market, expect this to continue

apple numbers are out! a monster blowout!

excellent time to buy into. this stock is solid and will continue to go up.

Bearish

if dexia doesnt get a bailout, markets will plunge%+ in a session, it is a lot bigger than lehman ever was.

if the charts werent broken before, they are now

dont forget, sp500 monthly chart warns of impending crash.

Neutral

what to expect from the big google music announcement tomorrow $GOOG

who needs those rating agencies anyways, we have jp morgan.

tuesdays light economic calender includes retail, housing

Spam

unlimited free tv shows on your pc, free channels

I always look like a new man after a haircut. SWAG

if i got a missed call there wasnt money involved

Research Track

6 Thirty Third International Conference on Information Systems, Orlando 2012

The NLTK provides a means to measure the accuracy of a classifier’s training dataset by using this dataset to train the classifier and then running the training dataset as the dataset to classify. This produces an accuracy value that provides some insight into how well the training dataset performs. With the training dataset developed in this research, the NTLK accuracy method provides an accuracy rating of 89.35%.

Utilizing Python and the NLTK as the analysis system in conjunction with the training dataset, each collected Twitter message was analyzed using the NLTK’s Bayesian Classification method and assigned a sentiment. Table 3 shows the outcome of the analysis of the more than 1.9 million messages. . Please note that the total is not equal to 100% due to some messages not being classified as they did not contain any content (e.g., the twitter message is blank).

A quick look at the outcome of the analysis shows that half of the messages captured are classified as ‘neutral’, which provides some insight into the main usage of Twitter by members of the investing community. Rather than share their trade ideas and market bias, they are sharing information and interacting with other members of the community. Another outcome is the amount of messages classified as Spam, which can be attributed to the use of the term “$wag” by Twitter users. Instead of using this term to speak about Walgreen’s use of the StockTwits nomenclature for their stock symbol “$wag”, the majority of these messages were from people using the term as a form of slang meaning “cool” (Chen 2011). While these messages aren’t necessarily spam in the broader sense of the term, they aren’t market related and can be considered spam for the purposes of this research.

Table 3. Sentiment Analysis Outcome – All Data (May 2011 to August 2012)

Classification Count Percentage

Bullish 360,822 18.44%

Bearish 344, 700 17.63%

Neutral 978,936 50.00%

Spam 269, 681 13.79%

Total 1,954,565 99.86%

For the purposes of this paper, the entire dataset will not be used as there were some days between May 1 2o11 and November 1 2011 without data due to the Twitter API being down and/or software issues. The dataset used through the rest of this paper contains data captured from November 1 2011 through August 30 2012. The outcome of the sentiment analysis of this dataset is provided in Table 4. Please note that the total is not equal to 100% due to some messages not being classified as they did not contain any content (e.g., the twitter message is blank).

Table 4. Sentiment Analysis Outcome – Data from November 2011 to August 2012

Classification Count Percentage

Bullish 304,046 17.85%

Bearish 281,860 16.55%

Neutral 876,803 51.47%

Spam 240,460 14.11%

Total 1,703,585 99.98%

Bulls, Bears…and Birds?

Thirty Third International Conference on Information Systems, Orlando 2012 7

Twitter Sentiment and the Market

With the twitter messages from the November 2011 to August 2012 dataset now analyzed for sentiment, a correlation with the stock market can be attempted to determine if there are any signals that provide predictive capabilities for use investing decisions.

The S&P 500 Index (SPX) is used as the surrogate for the market by using the closing price at the end of each day for the SPX. This end-of-day data for was obtained from EODData.com and stored into a MySQL database to be used for analysis (Eoddata.com 2012).

In order to match the Twitter sentiment with the close price of the SPX, the bullish and bearish Twitter sentiment for each day is summed up and a ratio of Bearish / Bullish sentiment is calculated by dividing the Bearish Sentiment by the Bullish Sentiment on a daily basis. The Bear/ Bull ratio doesn’t have a particular scale but generally, a ratio under 1.0 tends to more bullish as the denominator is larger (i.e., there is more sentiment that is bullish) and a ratio over 1.0 would point to more bearish sentiment as the numerator is larger (i.e., there is more sentiment that is bearish). An example of the Daily Sentiment Bear / Bull Ratio and SPX Close price is shown in Table 5.

Table 5. Twitter Bear / Bull Ratio and SPX Close

Date Bear / Bull Ratio SPX Close

11/4/2011 0.7583 1253.23

11/7/2011 0.6204 1261.12

11/8/2011 0.8141 1275.92

It should be noted that the Bear/Bull ratio is was only sampled for weekdays as those are the days that the SPX are open. Data for weekends is available but for the purposes of this paper, it is not included in the analysis.

Basic descriptive statistical analysis on the Twitter Bear/Bull Ratio and the SPX Close shows that there is very little correlation between the daily Bear/Bull Ratio and the same day’s close on the S&P 500 SPX. Descriptive statistics are shown in Table 6 and a scatterplot of the relationship between the Twitter Bear / Bull ratio and the SPX close is shown in Figure 1.

Table 6.Descriptive Statistics for Twitter Bear / Bull Ratio and SPX Close

Variable N N* Mean SE Mean

Standard Deviation

Min Q1 Median Q3 Max

Bear/Bull 213 0 0.9398 0.0175 0.2554 0.3340 0.7590 0.9333 1.1059 1.9694

SPX Close 213 0 1333.5 4.21 61.4 1158.7 1293.1 1347.1 1377.7 1419.0

Research Track

8 Thirty Third International Conference on Information Systems, Orlando 2012

Figure 1: Scatterplot of Twitter Bear / Bull Ratio vs SPX Close

As can be seen in the scatterplot, there does not appear to be any clear correlation between the daily Twitter Bear / Bull ratio and the close of the S&P 500 Index. Calculating the Pearson Correlation coefficient and P-Value, shown in Table 7, provides clarity that there is little statistically significant correlation between the daily Twitter Bear / Bull Ratio and the SPX Close.

Table 7.Correlation between Twitter Bear / Bull and SPX Close

Pearson correlation P-Value

0.015 0.824

While statistically speaking, the Twitter Bear/Bull ratio is not correlated directly with the close price of the SPX, a graphical analysis of the Bear/Bull Ratio compared to the SPX shows some patterns that might point to further consideration that there may be some form of predictive capabilities found within the Twitter Bear/Bull ratio. A comparison of the SPX Daily Close and the Twitter Bear/Bull Daily Twitter Sentiment graph is shown below. Figure 2 is a graph of the SPX Daily Close from November 1 2011 to August 31 2012. Figure 3 is the 21 Day Moving Average of the Bear/Bull Daily Twitter Sentiment (top pane) and the actual Daily Bear/Bull Twitter Sentiment (bottom pane).

As can be seen in Figure 3, the Daily Bear/Bull Twitter Sentiment (shown in the bottom pane above) is very noisy and it is difficult to notice any patterns between it and the market. Due to this noise, a moving average is used to smooth out the noisy patterns. In this instance, a rolling 21 day moving average / smoothing period was utilized as this time-frame uses approximately three weeks of data to create the average value. The period of 21 days was chosen due to its wide acceptance as a standard moving average period among stock market technical analysts (Deaton 2012; Ma et al. 2004) .

While comparing the SPX Daily Close and the 21 Day MA of the Twitter Bear/Bull Ratio (shown in the top pane above), there appears to be some areas of extremes that tend to match movements in the SPX Daily Close. For example, in December 2011, the Bear/Bull ratio is more Bearish while at the same time the SPX was making lows. Additionally, as the SPX was moving higher in the time period between Jan 2012 to April 2012, the Twitter Bear/Bull sentiment remained Bullish. An additional example can be seen as the SPX begins to decline in the first half of April 2012 and the Twitter Bear/Bull Sentiment moves from Bullish to more Bearish levels.

1450140013501300125012001150

2.0

1.5

1.0

0.5

SPX Close

Be

ar/

Bu

ll

Scatterplot of Bear/Bull vs SPX Close

Bulls, Bears…and Birds?

Thirty Third International Conference on Information Systems, Orlando 2012 9

Figure 2: S&P 500 Close

Figure 3: Twitter Bear / Bull Ratio

A Pearson Correlation of the SPX Daily Close and the 21 Day MA Bear / Bull ratio shows little direct correlation between the two series with a Correlation of 0.127 and a P-Value of 0.064. Again, this shows little statistical correlation between Twitter sentiment and the Close of the SPX on a daily basis.

From a purely statistical nature, there seems to be very little correlation between Twitter Sentiment and the SPX Daily Close within this time period. That said, from a big picture standpoint, there seems to be something within Twitter Sentiment that, considering the 21 Day MA. reviewing patterns and comparing sentiment to previous periods, Twitter sentiment might provide some value in helping investors understand market direction and sentiment.

Research Track

10 Thirty Third International Conference on Information Systems, Orlando 2012

The lack of correlation between Twitter Sentiment and the SPX makes it difficult to justify undertaking research to create predictive models with Twitter Sentiment. However, before making the declaration that there is no value to be found in Twitter Sentiment, it would be useful to look at other sentiment measures to determine whether those measures are correlated with the market and how Twitter Sentiment compares to those measures.

Sentiment Comparison – Twitter and the AAII Sentiment Survey

A comparison was made between the American Association of Individual Investors (AAII) Sentiment Survey and Twitter Sentiment in order to determine if there is any correlation between Twitter Sentiment and sentiment found through more manually generated surveys.

The American Association of Individual Investors (AAII) is a non-profit organization focused on providing education and services to investors for assistance with investing decisions. As part of their services, they have been providing a weekly report containing the results of a survey of investors’ sentiment (AAII 2012). This report, which measures the sentiment of AAII member’s on a weekly basis, has been released weekly since 1987 and is provided as a downloadable spreadsheet from the AAII website (AAII 2012).

Due to the long history of the AAII Sentiment Survey and the ease of access to the data, this sentiment data was used to compare with Twitter Sentiment. In order to compare the weekly AAII sentiment data to the Twitter Sentiment data, the Twitter Sentiment data was transformed from a daily measure to a weekly measure. This transformation was accomplished by taking the sentiment data from Thursday through Wednesdat of each week (not including weekends) and combining the data to create a Weekly Sentiment value for each week in order to match reporting periods and days used by the AAII survey.

In addition to the weekly report, the AAII report provides an eight week average of investor sentiment. This eight week average is created and applied as a simple moving average of the Twitter Sentiment for comparison purposes. The AAII Bear / Bull eight week average and raw data is provided in Figure 4.

Figure 4: AAII Sentiment Survey Data

The weekly Twitter Sentiment data with eight week moving average and raw weekly Twitter Sentiment Bear / Bull data is provided in Figure 5. Upon first glance, both eight week averages have very similar patterns with a move from a high to low in Jan 2012 to Mar 2012 and a subsequent move up to a more Bearish sentiment peaking around July 2012.

Bulls, Bears…and Birds?

Thirty Third International Conference on Information Systems, Orlando 2012 11

Figure 5: Twitter Weekly Sentiment Data

While the value ranges of both the AAII and Twitter sentiment measures are different, it would be very difficult to look at Figure 4 and Figure 5 and not see some correlation between both measures. Running the Pearson Correlation between the two data series shows a strong positive correlation with the correlation coefficient and P-Value shown in Table 8.

Table 8.Correlation between AAII and Weekly Twitter Bear / Bull

Pearson correlation P-Value

0.583 0.000

While the AAII Sentiment Survey and the weekly Twitter Sentiment have a strong positive relationship, the correlations between these sentiment measures and the weekly close of the SPX is still negligible as shown in Table 9.

Table 9.Correlations between AAII, Weekly Twitter Bear / Bull and SPX Weekly Close

AAII / SPX Pearson correlation P-Value

-0.031 0.845

Weekly Twitter Sentiment / SPX Pearson Correlation P-Value

0.025 0.872

Based on the negligible correlation between the AAII Sentiment Survey and the close of the SPX, it would be difficult to justify ongoing research into building predictive models based on using either of these sentiment measures as they are presented.

While the correlation between these two sentiment measures and the close of the SPX is negligible, there does appear to be areas with extremes and inflection points that might be useful for investors to make decisions on directional changes in the market. For example, both sentiment measures have an inflection

Research Track

12 Thirty Third International Conference on Information Systems, Orlando 2012

point around January 2012 where sentiment moves toward a more bullish sentiment on both measures while at the same time, the SPX begins a climb to higher closing prices.

The correlation between Twitter Sentiment and the AAII Sentiment Survey provides feedback that the sentiment classification approach taken in this project is quite accurate. This correlation also points to the underlying value of the information shared by Twitter users discussing the market.

Conclusion

The research presented in this project has attempted to determine if sentiment is being conveyed via Twitter messages. Using an implementation of the Bayesian Classification method found in Python’s Natural Language Toolkit, sentiment of captured Twitter messages was determined and a category of Bullish, Bearish, Neutral or Spam was assigned.

The sentiment found within Twitter messages does not appear to have any correlation with the S&P 500 Index closing price on a daily or weekly basis, but there does appear to be points of inflection and points of extreme measures that can provide decision points for investors. Examples of these extremes/inflections points can be seen around the February, May and July 2012 timeframes. Using these inflection points or extremes, an investor can determine that sentiment shifts are occurring and it might be time to review investments and portfolios to prepare for market movements.

Additionally, the Twitter Sentiment found with the Bayes Classification approach has a very strong positive correlation with the weekly sentiment survey released by the American Association of Individual Investors. This strong correlation points to the existence of sentiment within these Twitter messages and that Twitter sentiment might be another measure of sentiment that can be used by investors.

There are a number of future opportunities to extend the research reported in this study. One avenue of research that should be undertaken is in the area of sentiment classification methods. This study used the Bayes Classification method with an accuracy of 89.35%, but there may be other methods that would provide more accurate classifications. An additional avenue that deserves more research is that of using sentiment measures gathered from Twitter for creating predictive signals for investing decisions.

While this study showed a negligible correlation between sentiment and the close of the SPX on a daily and weekly closing basis, different timeframes might provide more opportunities for correlation. Additionally, research into whether sentiment of individual stocks might be more highly correlated with prices for those stocks is an area that should be considered.

Lastly, a research project that reviews methods combining Twitter sentiment measures with other sentiment measures found from news, blogs and other online user-generated content might provide some productive insight in investing decisions.

References

AAII "AAII Sentiment Survey," American Association of Individual Investors,

http://www.aaii.com/sentimentsurvey, 2012.

Antiga, L., Piccinelli, M., Botti, L., Ene-Iordache, B., Remuzzi, A., and Steinman, D. "An image-based modeling

framework for patient-specific computational hemodynamics," Medical and Biological Engineering and

Computing (46:11) 2008, pp 1097-1112.

Antweiler, W., and Frank, M. Z. "Is All That Talk Just Noise? The Information Content of Internet Stock Message

Boards," Journal of Finance (59:3) 2004, pp 1259-1294.

Baker, M., and Wurgler, J. "Investor Sentiment in the Stock Market," Journal of Economic Perspectives (21) 2007,

pp 129-152.

Barberis, N., Shleifer, A., and Vishny, R. "A model of investor sentiment," Journal of Financial Economics (49)

1998, pp 307-343.

Bifet, A., and Frank, E. "Sentiment Knowledge Discovery in Twitter Streaming Data," in: Proceedings of the 13th

international conference on Discovery science, Springer-Verlag, Canberra, Australia, 2010, pp. 1-15.

Bollen, J., Mao, H., and Zeng, X.-J. "Twitter Mood Predicts the Stock Market," 2010.

Brown, E. "Will Twitter Make You a Better Investor? A Look at Sentiment, User Reputation and Their Effect on the

Stock Market," in: SAIS 2012 Proceedings, Association for Information Systems, 2012.

Bulls, Bears…and Birds?

Thirty Third International Conference on Information Systems, Orlando 2012 13

Brown, S. J., Goetzmann, W. N., and Kumar, A. "The Dow Theory: William Peter Hamilton's Track Record

Reconsidered," The Journal of Finance (53:4) 1998, pp 1311-1333.

Cai, X., and Langtangen, H. "Parallelizing PDE Solvers Using the Python Programming Language

Numerical Solution of Partial Differential Equations on Parallel Computers," A.M. Bruaset and A. Tveito (eds.),

Springer Berlin Heidelberg, 2006, pp. 295-325.

Chapman, B. A., Bowers, J. E., Schulze, S. R., and Paterson, A. H. "A comparative phylogenetic approach for

dating whole genome duplication events," Bioinformatics (20:2), January 22, 2004 2004, pp 180-185.

Chen, A. "The Old Person's Guide to 'Swag'," 2011.

Chua, C. C., Milosavljevic, M., and Curran, J. R. "A Sentiment Detection Engine for Internet Stock Message

Boards," 2009.

Curran, J. R. "Blueprint for a high performance NLP infrastructure," in: Proceedings of the HLT-NAACL 2003

workshop on Software engineering and architecture of language technology systems - Volume 8,

Association for Computational Linguistics, 2003, pp. 39-44.

Das, S. R., and Chen, M. Y. "Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web," Journal of

Management Science (54) 2007, pp 1375-1388.

Deaton, A. "The financial crisis and the well-being of Americans," Oxford Economic Papers (64:1), January 1, 2012

2012, pp 1-26.

Durant, K., and Smith, M. "Mining Sentiment Classification from Political Web Logs," In Proceedings of Workshop

on Web Mining and Web Usage Analysis of the 12 th ACM SIGKDD International Conference on

Knowledge Discovery and Data Mining (WebKDD-2006, 2006.

Eoddata.com "EOD Data - End of Data Stock Market Data," 2012.

Frank, E., and Bouckaert, R. R. "Naive Bayes for Text Classification with Unbalanced Classes," Proc 10th European

Conference on Principles and Practice of Knowledge Discovery in Databases, Springer, 2006, pp. 503-510.

Go, A., Bhayani, R., and Huang, L. "Twitter Sentiment Classification using Distant Supervision," Processing) 2009,

pp 1-6.

Hassan, T. A., and Mertens, T. M. "Market Sentiment: A Tragedy of the Commons," Social Science Research

(101(2)) 2010, pp 402-405.

Hearst, M. "Teaching applied natural language processing: triumphs and tribulations," in: Proceedings of the Second

ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and

Computational Linguistics, Association for Computational Linguistics, An Arbor, Michigan, 2005, pp. 1-8.

Keynes, J. M. "General Theory of Employment, Interest and Money,") 1936.

King, R. "Trading on a World of Sentiment," (2011) 2011.

Liddy, E. D., and McCracken, N. J. "Hands-on NLP for an interdisciplinary audience," in: Proceedings of the

Second ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing

and Computational Linguistics, Association for Computational Linguistics, An Arbor, Michigan, 2005, pp.

62-68.

Lin, C., and He, Y. "Joint Sentiment/Topic Model for Sentiment Analysis," in: Proceeding of the 18th ACM

conference on Information and knowledge management, ACM, Hong Kong, China, 2009, pp. 375-384.

Loper, E., and Bird, S. "NLTK: the Natural Language Toolkit," in: Proceedings of the ACL-02 Workshop on

Effective tools and methodologies for teaching natural language processing and computational linguistics -

Volume 1, Association for Computational Linguistics, Philadelphia, Pennsylvania, 2002, pp. 63-70.

Lu, T., and Matos, D. M. d. "High-performance high-volume layered corpora annotation," in: Proceedings of the

Third Linguistic Annotation Workshop, Association for Computational Linguistics, Suntec, Singapore,

2009, pp. 99-107.

Lui, B. "Sentiment Analysis and Subjectivity," in: Handbook of Natural Language Processing, N. Indurkhya and

F.J. Damerau (eds.), 2010.

Lutz, C. "Agent Sentiment and Stock Market Predictability," New York) 2010.

Ma, I., Wong, T., Sankar, T., and Siu, R. "Forecasting the volatility of a financial index by wavelet transform and

evolutionary algorithm," Systems, Man and Cybernetics, 2004 IEEE International Conference on, 2004, pp.

5824-5829 vol.5826.

Mian, G. M., and Sankaraguruswamy, S. "Investor Sentiment and Stock Market Response to Earnings News

Investor Sentiment and the Stock Market Response to Earnings News," Finance) 2010.

Mitra, G., DiBartolomeo, D., Banerjee, A., and Yu, X. "Automated Analysis of News to Compute Market Sentiment

: Its Impact on Liquidity and Trading," Structure, 2011.

Research Track

14 Thirty Third International Conference on Information Systems, Orlando 2012

NAAIM "NAAIM Survey of Manager Sentiment," (2012) 2012, p NAAIM member firms who are active money

managers a.

Oh, C., and Sheng, O. R. L. "Investigating Predictive Power of Stock Micro Blog Sentiment in Forecasting Future

Stock Price Directional Movement," in: International Conference on Information Systems 2011, Shanghai,

China, 2011.

Otoo, M. W. "Consumer sentiment and the stock market," in: FINANCE AND ECONOMICS DISCUSSION

SERIESFEDERAL RESERVE BOARD, FEDERAL RESERVE SYSTEM, 1999.

Pak, A., and Paroubek, P. "Twitter as a Corpus for Sentiment analysis and Opinion Mining," in: Language

Resources and Evaluation (LREC) LREC 2010 Proceedings, Malta, 2010.

Pang, B., and Lee, L. "Opinion Mining and Sentiment Analysis," Found. Trends Inf. Retr. (2) 2008a, pp 1-135.

Pang, B., and Lee, L. "Opinion Mining and Sentiment Analysis," Found. Trends Inf. Retr. (2:1-2) 2008b, pp 1-135.

Pang, B., Lee, L., and Vaithyanathan, S. "Thumbs up?: sentiment classification using machine learning techniques,"

in: Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume

10, Association for Computational Linguistics, 2002a, pp. 79-86.

Pang, B., Lee, L., and Vaithyanathan, S. "Thumbs up?: sentiment classification using machine learning techniques,"

in: Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume

10, Association for Computational Linguistics, 2002b, pp. 79-86.

Rossum, G. v., and Drake, F. L. (eds.) Python Programming Language. PythonLabs, Virginia, USA, 1991.

Sahami, M., Dumais, S., Heckerman, D., and Horvitz, E. "A Bayesian Approach to Filtering Junk E-Mail," AAAI

Workshop on Learning for Text Categorization, Madison, Wisconsin, 1998.

Sangeetha, S., and Arock, M. "Recognising sentence similarity using similitude and dissimilarity features," Int. J.

Adv. Intell. Paradigms (4:2) 2012, pp 120-131.

Sprenger, T. O., and Welpe, I. M. "Tweets and Trades: The Information Content of Stock Microblogs," in: Working

Paper Series, Technische Universität München (TUM), 2010, p. 89.

Stocktwits "Stocktwits.com," 2011.

Stocktwits.com "About StockTwits," (2012) 2012.

Thelwall, M., Buckley, K., and Paltoglou, G. "Sentiment in Twitter events," Journal of the American Society for

Information Science and Technology (62) 2011, pp 406-418.

Thet, T. T., Na, J.-C., Khoo, C. S. G., and Shakthikumar, S. "Sentiment Analysis of Movie Reviews on Discussion

Boards Using a Linguistic Approach," in: Proceeding of the 1st international CIKM workshop on Topic-

sentiment analysis for mass opinion, ACM, Hong Kong, China, 2009, pp. 81-84.

Tumarkin, R., and Whitelaw, R. F. "News or Noise? Internet Postings and Stock Prices," Financial Analysts Journal

(57) 2001a, pp 41-51.

Tumarkin, R., and Whitelaw, R. F. "News or Noise? Internet Postings and Stock Prices," Financial Analysts Journal

(57:3) 2001b, pp 41-51.

Twitter "Twitter API," http://dev.twitter.com, 2011a.

Twitter Twitter API, http://dev.twitter.com, 2011b.

Twitter "About Twitter," Twitter, http://twitter.com/about, 2012.

Webster, M. "Sentiment Defintion," in: Merriam Webster, 2012.

Wolfram, M. S. "Modelling the Stock Market using Twitter," in: School of Informatics, Universty of Edinburgh,

Universty of Edinburgh, 2010, p. 74.

Wysocki, P. D. "Cheap Talk on the Web: The Determinants of Postings on Stock Message Boards," SSRN eLibrary)

1998.

Xing, C., Hans Petter, L., and Halvard, M. "On the performance of the Python programming language for serial and

parallel scientific computations," Scientific Programming (13:1) 2005, pp 31-56.

Zhang, X., Fuehres, H., and Gloor, P. A. "Predicting Stock Market Indicators Through Twitter “I hope it is not as

bad as I fear”," Procedia - Social and Behavioral Sciences) 2011.

Zhang, Y. "Determinants of Poster Reputation on Internet Stock Message Boards," American Journal of Economics

and Business Administration (1:2) 2009, pp 114-121.