Upload
isabella-murphy
View
218
Download
2
Tags:
Embed Size (px)
Citation preview
Applications of Applications of news analytics in finance: news analytics in finance:
a reviewa review
Gautam MitraCo-author Leela Mitra
Summary and scopeSummary and scope In this talk we set out a structured (reading) guide
to the published research outputs: Journal papers, white papers, case studies which are emerging in the domain of “news analytics” applied to finance.
We aim to provide insight into the subtle interplay of information technology (including AI), the quantitative models and behavioural biases in the context of trading and investment decisions.
Applications such as low frequency and high frequency trading are presented; some desirable/potential applications are discussed.
OutlineOutline Introduction
News data Data sources Pre analysis of data
Determining sentiment scores General overview Das and Chen Lo
Models and applications in summary form (abnormal ) Returns Volatility and risk control
Desirable industry applications
Summary and discussions
Traders [ High Frequency ]
Fund Managers [ Low Frequency ]
Desktop
• Market Data
• NewsWire
Data WareHouse
DataMart
IntroductionIntroduction
R & D Challenge R & D Challenge Identify Killer Application Identify Killer Application
Smart investors rapidly analyse/digest information.
News stories/announcements.
Stock price moves (market reactions).
Act promptly to take trading/investment decisions.
Can a machine act intelligently(AI) to compete or outsmart humans ?
IntroductionIntroduction
At least can we have IT/AI tools which help humans make good investment decisions?
Intelligence Amplification<Gearing… engineering concept>
Thus three disciplines converge;
Information Systems
AI, in particular, Natural Language Processing
Financial Engineering/quantitative Modelling
( including behavioural finance )
IntroductionIntroduction
IntroductionIntroduction
Data analysis Datamart quant models
Mainstream News
Pre-News
Web 2.0Social Media
Pre-Analysis Classifiers
Sentiment Scores
(Numeric) financial market data
Analysis Consolidated Datamart
Updated beliefs, Ex-ante view of market environment
Quant Models
1.Return Predictions2.Fund Management / Trading Decisions3.Volatility estimates and risk control
OutlineOutline Introduction
News data Data sources Pre analysis of data
Determining sentiment scores General overview Das and Chen Lo
Models and applications in summary form
(abnormal) Returns Volatility and risk control
Desirable industry applications
Summary and discussions
News data: Data sourcesNews data: Data sources Sources of news/informational flows (Leinweber)
News: Mainstream media, reputable sources. Newswires to traders desks. Newspapers, radio and TV.
Pre-News: Source data SEC reports and filings. Government agency reports. Scheduled announcements, macro economic news,
industry stats, company earnings reports…
Social media: Blogs, websites and message boards Quality can vary significantly Barriers to entry low Human behaviour and agendas
News data: Data sourcesNews data: Data sources Web based news
Individual investors pay more attention than institutional investors (Das and Rieger)
“Collective Intelligence” large group of people (no ulterior motives) their collective opinion may be useful.
SEC does monitor message boards Far from perfect vetting of information.
Financial news can be split between Scheduled news (Synchronous) Unscheduled news (Asynchronous, event driven)
News data: Data sourcesNews data: Data sources Scheduled news (Synchronous)
Arrives at pre scheduled times Much of pre news Structured format Often basic numerical format Typically macro economic announcements and earnings
announcements
News data: Data sourcesNews data: Data sources Macro economic announcements
Widely used in automated trading Impact large and most liquid markets (foreign exchange,
Govt. debt, futures markets) Naturally affects trading strategies. Speed and accuracy are key... technology requirements
substantial Providers in this space
Trade the News, Need to Know News, Market News International, Thomson Reuters, Dow Jones, Bloomberg…
Earnings announcements Directly influences stock prices’ Widely anticipated and used in trading strategies
News data: Data sourcesNews data: Data sources Unscheduled news (Asynchronous, event driven)
Arrives unexpectedly over time Mainstream news and social media Unstructured, qualitative, textual form Non-numeric Difficult to process quickly and quantitatively May contain information about effect and cause of an
event To be applied in quant models needs to be converted to an
input time series
OutlineOutline Introduction
News data Data sources Pre analysis of data
Determining sentiment scores General overview Das and Chen Lo
Models and applications in summary form (abnormal) Returns Volatility and risk control
Desirable industry applications
Summary and discussions
News data: Pre analysis of dataNews data: Pre analysis of data Collecting, cleaning and analysing news data …
challenging
Major newswire providers collect news from a wide range of sources e.g. Factiva database from Dow Jones, news from 400 sources
Tagging – Machine readable meta data
Major newswire providers tag incoming news stories Reporters tag stories as they enter them to system Machine learning techniques also used to identify relevant
tags (RavenPack) Unstructured stories into basic machine readable form Tags held in XML < standard for meta-data exchange> Reveals story’s topic areas and other useful meta data
News data: Pre analysis of dataNews data: Pre analysis of data Need to identify news which is relevant and current
“Information events” distinguish stories reporting on old news from genuinely “new” news
Tetlock et al. event study shows “information leakage”
News data: Pre analysis of dataNews data: Pre analysis of data Need to identify news which is relevant and current
Reuters give for each article Relevance scores … measures by how much the
article is about a particular company Novelty/uniqueness determines the repetition among
articles
RavenPack Distinguish stories which are events
Carry first mention of a particular theme Stories which are not events are excluded
To minimise number of duplicate stories
News data: Pre analysis of dataNews data: Pre analysis of data Classification of news
Tagged stories provide hundreds of event types
Need to distinguish what types of news are relevant to our application
Market may react differently to different types of news e.g. Moniz et. al. find market reacts more strongly to
earnings news than strategic news
Different news is available for different assets Larger companies with more liquid stock, tend to have
higher news coverage
News data: Pre analysis of dataNews data: Pre analysis of data Classification of news
Accounting related news Earnings
Announcements of earnings Restatements of Operating Results etc..
Trading updates Announcements of Sales/Trading Statement etc…
Strategic news M&A Related
M&A Rumours and discussion M&A Transaction announcements etc…
Restructuring issues etc…
News data: Pre analysis of dataNews data: Pre analysis of data Relationship of different news items /
Independence of news… important consideration
Seasonality of news (Hafez, Lo, Moniz)
Need to be able to identify unexpected newsflow from variation due to seasonality
Hourly, daily and weekly seasonality Intraday - larger volumes of newsflow just before
opening of European, US and Asian stockmarkets (Hafez)
News data: Pre analysis of dataNews data: Pre analysis of data
Illustration of Seasonality (Hafez, RavenPack)
OutlineOutline Introduction
News data Data sources Pre analysis of data
Determining sentiment scores General overview Das and Chen Lo
Models and applications in summary form (abnormal) Returns Volatility and risk control
Desirable industry applications
Summary and discussions
Determining sentiment scoresDetermining sentiment scores Informational content of news: Converting qualitative
data into a quantitative form … challenging
Distinguish the sentiment of stories (positive/negative) scale of positivity / negativity … sentiment scores
Consider the story’s context and language How positively/negatively human interprets story… emotive
content Expert classification Psychosocial dictionaries e.g. General Inquirer Different groups of people effected by events differently or
have different interpretations of same events …conflicts may arise
Determining sentiment scoresDetermining sentiment scores
Market based measures (Lo, Moniz et. al. and Lavernko) Markets’ lagged relative change in returns/volatility for a
particular asset (asset class)
Machine learning and natural language techniques can be used, to determine sentiment of incoming stories … sentiment indices over time
Index validation - To use index we must be able to find relationship with relevant market variables
OutlineOutline Introduction
News data Data sources Pre analysis of data
Determining sentiment scores General overview Das and Chen Lo
Models and applications in summary form (abnormal) Returns Volatility and risk control
Desirable industry applications
Summary and discussions
Das and ChenDas and Chen extract investor sentiment from stock message
boards
for Morgan Stanley High Tech (MSH) Index
Web scraper program downloads tech sector message board messages
Five algorithms with different conceptual underpinnings are used to classify each message
Voting scheme is then applied
Das and ChenDas and Chen Three supplementary databases
Dictionary – nature of the word, noun adjective, adverb.
Lexicon - collection of hand picked words which form variables for statistical inference within the algorithms
Grammar – training corpus of base messages used in determining in-sample statistical information. Applied for use on the out-of-sample messages
Lexicon and grammar jointly determine the context of the sentiment
Das and ChenDas and Chen Five algorithms: (=Classifiers)
1. Naïve classifier Based on word count of positive and negative
connotation words
2.Vector distance classifier Each of the D words in the lexicon is assigned a dimension
in vector space Each training message is pre classified as positive,
negative or neutral Each new message is classified by comparison to the
cluster of pre trained vectors and is assigned the same classification as that vector with which it has the smallest angle
Das and ChenDas and Chen
3. Discriminant based classifier NC weights all words within the lexicon equally. The
discriminant based classification method replaces this simple word count with a weighted word count.
The weights determine how well a particular lexicon word discriminates between the different message categories
4.Adjective-adverb phrase classifier This is based on the assumption that phrases which use
adjectives and adverbs emphasize sentiment and require greater weight.
Uses a word count but uses only those words within phrases containing adjectives and adverbs.
Das and ChenDas and Chen5.Bayesian classifier
Given the class of each message in the training set we can determine the frequency with which a lexical word appears in a particular class.
For a new message we are able to compute the probability it falls within a particular class given its component lexicon words
The message is classified as being from the category with the highest probability.
Voting scheme … final classification based on achieving majority amongst classifiers Reduces number of messages classified Enhances classification accuracy
Das and ChenDas and Chen Ambiguity - stock message boards messages often
highly ambiguous Use General Inquirer … determine optimism score Filter in and consider only most highly optimistic
stories in positive category Filter in and consider only the most highly pessimistic
scores in the negative category Number of false positive in classification declines
Disagreement – 0 no disagreement; 1 high disagreement
Das and ChenDas and Chen Relationship between sentiment indices and
market variables ? Nature of sentiment index? Positive sentiment bias
Fig shows histogram of normalised sentiment for a stock…positively skewed
RavenPack find positive bias in classifiers … more marked in bull markets
Das and ChenDas and Chen Relationship between sentiment indices and
market variables Sentiment and stock levels – are related …
determining precise nature of price relationship is difficult
Sentiment inversely related to disagreement Disgreement rises, sentiment falls
Sentiment correlated to posting volume Discussion increases, indicates optimism about stock is
rising Strong relationship between message volume and
volatility (Antweiler and Frank (2004) also) Strong relationship between trading volume and
volatility
OutlineOutline Introduction
News data Data sources Pre analysis of data
Determining sentiment scores General overview Das and Chen Lo
Models and applications in summary form (abnormal) Returns Volatility and risk control
Desirable industry applications
Summary and discussions
LoLo Reuters NewsScope Event Indices (NEI) are
constructed to have predictive power for returns and realised
volatility integrated framework, returns and volatility used in
calibrating indices News data
Reuters newsalerts -quick news flashes issued when newsworthy events occur – timely and relevant
Tags machine readable Headlines concise, small vocabulary…good for machine
learning analysis
LoLo The following parameters are used
List of keywords and phrases with real valued weights
A rolling “sentiment window” of size r (say 5/10 minutes)
A rolling calibration window of size R (say 90 days) is the vector of keyword frequencies over
Raw score is defined as
this will tend to be high when news volume is high …normalised score
LoLo Normalised score
At all times t in R days of calibration window record raw score news volume;
Normalised score determined by comparing current raw score against raw scores where news volume equals current news volume
St =0.92: 92 % of time news volume is at current level, the raw score is less than it currently is.
LoLo Model calibration
Determine keywords Create list of keywords by hand Tool to extract news from periods when scores are
high… determine whether keywords are legitimate or need adjusting
Optimal weights for intraday return sentiment index regress word frequencies against intraday returns
Optimal weights for intraday volatility sentiment index regress word frequencies against (deseasonalised)
intraday realised volatility
LoLo Model calibration
Determining optimal weights more general classification problem
Other techniques…machine learning…perceptron algorithm, support vector machines…
LoLo Index validation – to establish empirical significance of
indices… event study analysis Event is defined when (return/volatility sentiment) index
exceeds a threshold value (0.995) Remove events that follow in less than one hour of
another event … consider only “new” events Tests null hypothesis: Distribution of returns /
deseasonalised realised volatility is the same before / after an event. Visual inspection t –test for equality of means Levene’s test for change in standard deviation Chi – squared goodness of fit
OutlineOutline Introduction
News data Data sources Pre analysis of data
Determining sentiment scores General overview Das and Chen Lo
Models and applications in summary form (abnormal) Returns Volatility and risk control
Desirable industry applications
Summary and discussions
Average Stock Price Reaction to Negative News EventsAverage Stock Price Reaction to Negative News Events
Source: Macquarie Quant Research –May 2009
Model & Applications… (abnormal ) Model & Applications… (abnormal ) ReturnsReturns
Average Stock Price Reaction to Positive News EventsAverage Stock Price Reaction to Positive News Events
Source: Macquarie Quant Research –May 2009
Model & Applications… (abnormal ) Model & Applications… (abnormal ) ReturnsReturns
Model & Applications… (abnormal ) Model & Applications… (abnormal ) ReturnsReturns
Traders and quant managers … identify and exploit asset mispricings before they correct … generate alpha
News data can be used
Stock picking and generating trading signal
Factor models
Exploit behavioural biases in investor decisions
Model & Applications… (abnormal ) Model & Applications… (abnormal ) ReturnsReturns
Stock picking and generating trading signal
Li (2006) simple ranking procedure … identify stocks with positive and negative sentiment 10 K SEC filings for non-financial firms 1994 – 2005 Risk sentiment measure – count number of times
wordsrisk, risks, risky, uncertain, uncertainty and uncertaintiesappear in management discussion and analysis section
Strategy long in low risk sentiment stocks short in high risk sentiment stocks … reasonable level returns
Leinweber (2010) – event studies based on Reuters NewsScope Sentiment Engine
Model & Applications… (abnormal ) Model & Applications… (abnormal ) ReturnsReturns
Factor models
CAPM (Sharpe 1964; Lintner 1965), APT (Ross 1976) …additional sources of information to market
“Profits may be viewed as the economic rents which accrue to [the] competitive advantage of … superior information, superior technology, financial innovation” (Lo )
Tetlock, Saar-Tsechansky and Mackassy (2008)
Investors’ perception … determined from… their “information sets”
Model & Applications… (abnormal ) Model & Applications… (abnormal ) ReturnsReturns Factor models
“Information sets”1. analysts forecasts,2. quantifiable publicly disclosed accounting variables3. linguistic descriptions of firm’s current and future
profit generating activitiesIf 1. and 2. are incomplete or biased, 3. may give relevant information
MacQuarie Report Cahan et. al., News sentiment data in a multifactor models.
Results are positive … such an approach does add value. In particular they note the value of this source of
information during the credit crisis, when determining fundamentals (which traditional quant factors are based on) was problematic.
Model & Applications… (abnormal ) Model & Applications… (abnormal ) ReturnsReturns
Behavioural biases
Behavioural economists challenge the assumption that markets act rationally … EMH AMH ( Lo )
Propose individuals display certain biased behaviour
Due to biases they systematically deviate from optimal (rational) trading behaviour
Use behavioural biases to explain (abnormal) returns, rather than risk based explanations.
Model & Applications… (abnormal ) Model & Applications… (abnormal ) ReturnsReturns
Behavioural biases
Odean and Barber (2007) find evidence individual investors have a tendency to buy attention grabbing stocks.
Professional investors better equipped to assess a wider range of stocks they are less prone to buying attention grabbing stocks
Da, Engleberg and Gao also consider how the amount of attention a stock received affects its cross-section of returns.
Use the frequency of Google searches for a particular company as a measure of attention.
Find some evidence that changes in investor attention can predict the cross-section of returns.
Model & Applications… (abnormal ) Model & Applications… (abnormal ) ReturnsReturns
Behavioural biases
Chan (2003) finds stocks with major public news exhibit momentum over the following month.
In contrast stocks with large price movements, but an absence of news, tend to show return reversals in the following month.
This would support a trading strategy based on momentum reinforced with news signals.
Moniz et. al. (2009) finds a strategy based on earnings momentum reinforced by newsflow is effective.
OutlineOutline Introduction
News data Data sources Pre analysis of data
Determining sentiment scores General overview Das and Chen Lo
Models and applications in summary form (abnormal) Returns Volatility and risk control
Desirable industry applications
Summary and discussions
Applications: Risk managementApplications: Risk management Traditionally historic asset price data has been
used to estimate risk measures. ex post retrospective measures fail to account for developments in the market
environment, investor sentiment and knowledge
Significant changes in the market environment Traditional measures can fail to capture the true level
of risk(Mitra, Mitra and diBartolomeo 2009; diBartolomeo and Warrick 2005)
Incorporating measures or observations of the market environment in risk estimation is important
Applications: Risk managementApplications: Risk management The risk structure of assets may change over
time
Patton and Verardo find news impacts beta of stocks and in particular most of beta increase comes from rising covariance, suggesting there is contagion in information content of news releases.
Applications: Risk managementApplications: Risk management Relationship between information release and
volatility widely reported
Ederington and Lee (1993) macro economic announcements and foreign exchange and interest rate futures
Stock message board activity is a good predictor of volatility Antweiler and Frank (2004); Wysocki (1999)
GARCH model with news inputsKalev et al. (2004); Robertson, Geva and Wolff (2007)
OutlineOutline Introduction
News data Data sources Pre analysis of data
Determining sentiment scores General overview Das and Chen Lo
Models and applications in summary form (abnormal) Returns Volatility and risk control
Desirable industry applications
Summary and discussions
Desirable Industry ApplicationsDesirable Industry Applications
1. Enhanced Strategies ( Asset Management)
Low Frequency Portfolio (rebalancing) early trigger based on “draw down” rules/risk.
High Frequency
• Trading “wish to” trade signals.
• Trading “have to/need to trade sell and buy” signals.
• News analytics market views taken into consideration for the “optimal trade execution” algorithms.
{ VWAP, Almgren & Chriss, Lo & Bertsimas }
2. Risk Control and Compliance.
improved short term risk estimate.
Enhanced downside risk estimate;(improving scenario generators by using sentiment scores).
???
Wolf Detection;Signal to stop trading in a specific stock/asset.
Desirable Industry ApplicationsDesirable Industry Applications
3. Post trade analysis (reporting).
4. Refine fundamental research ( results /figures)
5. Use by regulator/public body (government treasuries) to take a prior view of the “impact” of (economic and other) announcements
Desirable Industry ApplicationsDesirable Industry Applications
OutlineOutline Introduction
News data Data sources Pre analysis of data
Determining sentiment scores General overview Das and Chen Lo
Models and applications in summary form (abnormal) Returns Volatility and risk control
Desirable industry applications
Summary and discussions
Summary & discussionsSummary & discussions Applications of (semi-)automated news
analytics in finance are growing in importance.
Pay back can be substantial to:
Investment Managers
Traders
Internal Risk Auditors
Regulators
Knowledge and Skills from three different disciplines:
Information Systems.
Artificial Intelligence.
Financial Engineering & quantitative modelling(including behavioural finance).
are required in various degrees to progress the field/make substantial impact.
Summary & discussionsSummary & discussions