20
Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

Embed Size (px)

Citation preview

Page 1: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

Forecasting with Twitter dataPresented by : Thusitha Chandrapala20064923

MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

Page 2: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

What information does twitter messages have?

•Twitter information▫Sentiment analysis: Are people happy or

unhappy about a certain topic? ▫Volume: Number of tweets about a given

topic

•Does twitter really help in predicting time series data?▫Moving stream of info.

Page 3: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

This motivation of the paper

•Use three different forecasting model families, vary parameters systematically and analyze under which conditions twitter information is actually useful

•Testing non-linearity and causality between twitter data and the target

•Introduction of summery tree

Page 4: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

Related work

• Stock market prediction▫Bollen et al:

Twitter -> sentiment->predict Dow Jones Industrial average

▫Wolfram et al. Twitter as an additional source of features, no sentiment

analysis

• Movie box office income▫Mishne et al:

correlation, blog posts▫Asur et al:

predict sales

Page 5: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

Work flow

1)• Collecting data

2)

• Cleaning and preprocessing

3)• Sentiment analysis

4)• Prediction model

Page 6: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

Preprocessing:

•Language detection

•Negation handling: considering “I like this…” and “I don’t like this… “ to be 2 features

•Relevance filtering and topic classification: using LDA▫Latent Dirichlet Allocation

Page 7: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

Sentiment classification•Whether the text contains negative or

positive impressions on a given subject•Approach 1:

▫Automatic tagging to extract training instances :) :D - Happy sentiment :( - Unhappy sentiment

▫Binary classification problem: Use naïve Bayes to train the classifier

▫Use different dictionaries as features

Page 8: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

Sentiment classification•Whether the text contains negative or

positive impressions on a given subject•Approach 1:

▫Automatic tagging to extract training instances :) :D - Happy sentiment :( - Unhappy sentiment

▫Binary classification problem: Use naïve Bayes to train the classifier

▫Use different dictionaries as features

Page 9: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

Sentiment index

•A time-series of sentiment values▫The daily value is calculated based on the

daily % of +/- tweets over the total number of messages on a specific topic

Page 10: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

Training the model

•ARMA : Auto Regressive Moving Average ▫y[t] = a.x[t]+b.x[t-1]+… +m.y[t-1]+n.y[t-2]

…..

•Simplified prediction:▫A binary prediction, which says if y[t]>y[t-

1]▫Use past values of self, and twitter time

series

Page 11: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

Model parameters

Target Time series Share Market :ReturnsMovie box office: Revenue

Twitter series VolumeSentiment Index

Forecasting model family Linear modelsSupport vector machinesNeural networks

Result: Does including Twitter data increase classification accuracy by 5%?

Page 12: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

Study details

•Stock market prediction targets▫Companies: Apple, google, … ▫General market indices: S&P100, S&P500

•Box office data▫Daily sales revenue series

Page 13: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

Summery Tree

•Helps to identify model parameters that leads to consistently +/- results

•Decision Tree structure ▫Nodes are different parameters▫Leaves : Result

Page 14: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

Summery Tree

Page 15: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

Results: Stock market data

•Summery of prediction results:▫Generally Linear models do not provide a

significance performance improvement either for twitter volume or sentiment analysis based info.

▫Non-linear models can give an improvement!

▫Neural network based models gave the best performance

Page 16: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

Results: Stock market data

Page 17: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

Results: Movie box office

•Summary:▫Sentiment analysis did not have a positive

impact▫Volume information had a positive impact

with Linear regression and SVM

Page 18: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

Conclusion

•In general, twitter information when used with non-linear models increase the prediction accuracy for long term stock market predictions

•Twitter volume had a linear relationship with movie sales, but sentiment analysis had none

Page 19: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

Appendix

•Logarithmic returns of the series

1

1

t tt

t

P PR

P

Page 20: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

Testing model adequacy

•Testing the relationship between twitter time series and the time series that has to be forecasted

•Neglected nonlinearity▫Are the 2 Time series non-linearly related?

•Granger causality▫X->Y OR Y->X ?