67
Applications of Applications of news analytics in finance: news analytics in finance: a review a review Gautam Mitra Co-author Leela Mitra

Applications of news analytics in finance: a review Gautam Mitra Co-author Leela Mitra

Embed Size (px)

Citation preview

Applications of Applications of news analytics in finance: news analytics in finance:

a reviewa review

Gautam MitraCo-author Leela Mitra

Summary and scopeSummary and scope In this talk we set out a structured (reading) guide

to the published research outputs: Journal papers, white papers, case studies which are emerging in the domain of “news analytics” applied to finance.

We aim to provide insight into the subtle interplay of information technology (including AI), the quantitative models and behavioural biases in the context of trading and investment decisions.

Applications such as low frequency and high frequency trading are presented; some desirable/potential applications are discussed.

OutlineOutline Introduction

News data Data sources Pre analysis of data

Determining sentiment scores General overview Das and Chen Lo

Models and applications in summary form (abnormal ) Returns Volatility and risk control

Desirable industry applications

Summary and discussions

News.

Market Environment.

Sentiment.

Investment Decisions.

Risk Control.

IntroductionIntroduction

Traders [ High Frequency ]

Fund Managers [ Low Frequency ]

Desktop

• Market Data

• NewsWire

Data WareHouse

DataMart

IntroductionIntroduction

R & D Challenge R & D Challenge Identify Killer Application Identify Killer Application

Smart investors rapidly analyse/digest information.

News stories/announcements.

Stock price moves (market reactions).

Act promptly to take trading/investment decisions.

Can a machine act intelligently(AI) to compete or outsmart humans ?

IntroductionIntroduction

At least can we have IT/AI tools which help humans make good investment decisions?

Intelligence Amplification<Gearing… engineering concept>

Thus three disciplines converge;

Information Systems

AI, in particular, Natural Language Processing

Financial Engineering/quantitative Modelling

( including behavioural finance )

IntroductionIntroduction

IntroductionIntroduction

Data analysis Datamart quant models

Mainstream News

Pre-News

Web 2.0Social Media

Pre-Analysis Classifiers

Sentiment Scores

(Numeric) financial market data

Analysis Consolidated Datamart

Updated beliefs, Ex-ante view of market environment

Quant Models

1.Return Predictions2.Fund Management / Trading Decisions3.Volatility estimates and risk control

OutlineOutline Introduction

News data Data sources Pre analysis of data

Determining sentiment scores General overview Das and Chen Lo

Models and applications in summary form

(abnormal) Returns Volatility and risk control

Desirable industry applications

Summary and discussions

News data: Data sourcesNews data: Data sources Sources of news/informational flows (Leinweber)

News: Mainstream media, reputable sources. Newswires to traders desks. Newspapers, radio and TV.

Pre-News: Source data SEC reports and filings. Government agency reports. Scheduled announcements, macro economic news,

industry stats, company earnings reports…

Social media: Blogs, websites and message boards Quality can vary significantly Barriers to entry low Human behaviour and agendas

News data: Data sourcesNews data: Data sources Web based news

Individual investors pay more attention than institutional investors (Das and Rieger)

“Collective Intelligence” large group of people (no ulterior motives) their collective opinion may be useful.

SEC does monitor message boards Far from perfect vetting of information.

Financial news can be split between Scheduled news (Synchronous) Unscheduled news (Asynchronous, event driven)

News data: Data sourcesNews data: Data sources Scheduled news (Synchronous)

Arrives at pre scheduled times Much of pre news Structured format Often basic numerical format Typically macro economic announcements and earnings

announcements

News data: Data sourcesNews data: Data sources Macro economic announcements

Widely used in automated trading Impact large and most liquid markets (foreign exchange,

Govt. debt, futures markets) Naturally affects trading strategies. Speed and accuracy are key... technology requirements

substantial Providers in this space

Trade the News, Need to Know News, Market News International, Thomson Reuters, Dow Jones, Bloomberg…

Earnings announcements Directly influences stock prices’ Widely anticipated and used in trading strategies

News data: Data sourcesNews data: Data sources Unscheduled news (Asynchronous, event driven)

Arrives unexpectedly over time Mainstream news and social media Unstructured, qualitative, textual form Non-numeric Difficult to process quickly and quantitatively May contain information about effect and cause of an

event To be applied in quant models needs to be converted to an

input time series

OutlineOutline Introduction

News data Data sources Pre analysis of data

Determining sentiment scores General overview Das and Chen Lo

Models and applications in summary form (abnormal) Returns Volatility and risk control

Desirable industry applications

Summary and discussions

News data: Pre analysis of dataNews data: Pre analysis of data Collecting, cleaning and analysing news data …

challenging

Major newswire providers collect news from a wide range of sources e.g. Factiva database from Dow Jones, news from 400 sources

Tagging – Machine readable meta data

Major newswire providers tag incoming news stories Reporters tag stories as they enter them to system Machine learning techniques also used to identify relevant

tags (RavenPack) Unstructured stories into basic machine readable form Tags held in XML < standard for meta-data exchange> Reveals story’s topic areas and other useful meta data

News data: Pre analysis of dataNews data: Pre analysis of data Need to identify news which is relevant and current

“Information events” distinguish stories reporting on old news from genuinely “new” news

Tetlock et al. event study shows “information leakage”

News data: Pre analysis of dataNews data: Pre analysis of data Need to identify news which is relevant and current

Reuters give for each article Relevance scores … measures by how much the

article is about a particular company Novelty/uniqueness determines the repetition among

articles

RavenPack Distinguish stories which are events

Carry first mention of a particular theme Stories which are not events are excluded

To minimise number of duplicate stories

News data: Pre analysis of dataNews data: Pre analysis of data Classification of news

Tagged stories provide hundreds of event types

Need to distinguish what types of news are relevant to our application

Market may react differently to different types of news e.g. Moniz et. al. find market reacts more strongly to

earnings news than strategic news

Different news is available for different assets Larger companies with more liquid stock, tend to have

higher news coverage

News data: Pre analysis of dataNews data: Pre analysis of data Classification of news

Accounting related news Earnings

Announcements of earnings Restatements of Operating Results etc..

Trading updates Announcements of Sales/Trading Statement etc…

Strategic news M&A Related

M&A Rumours and discussion M&A Transaction announcements etc…

Restructuring issues etc…

News data: Pre analysis of dataNews data: Pre analysis of data Relationship of different news items /

Independence of news… important consideration

Seasonality of news (Hafez, Lo, Moniz)

Need to be able to identify unexpected newsflow from variation due to seasonality

Hourly, daily and weekly seasonality Intraday - larger volumes of newsflow just before

opening of European, US and Asian stockmarkets (Hafez)

News data: Pre analysis of dataNews data: Pre analysis of data

Illustration of Seasonality (Hafez, RavenPack)

OutlineOutline Introduction

News data Data sources Pre analysis of data

Determining sentiment scores General overview Das and Chen Lo

Models and applications in summary form (abnormal) Returns Volatility and risk control

Desirable industry applications

Summary and discussions

Determining sentiment scoresDetermining sentiment scores Informational content of news: Converting qualitative

data into a quantitative form … challenging

Distinguish the sentiment of stories (positive/negative) scale of positivity / negativity … sentiment scores

Consider the story’s context and language How positively/negatively human interprets story… emotive

content Expert classification Psychosocial dictionaries e.g. General Inquirer Different groups of people effected by events differently or

have different interpretations of same events …conflicts may arise

Determining sentiment scoresDetermining sentiment scores

Market based measures (Lo, Moniz et. al. and Lavernko) Markets’ lagged relative change in returns/volatility for a

particular asset (asset class)

Machine learning and natural language techniques can be used, to determine sentiment of incoming stories … sentiment indices over time

Index validation - To use index we must be able to find relationship with relevant market variables

OutlineOutline Introduction

News data Data sources Pre analysis of data

Determining sentiment scores General overview Das and Chen Lo

Models and applications in summary form (abnormal) Returns Volatility and risk control

Desirable industry applications

Summary and discussions

Das and ChenDas and Chen extract investor sentiment from stock message

boards

for Morgan Stanley High Tech (MSH) Index

Web scraper program downloads tech sector message board messages

Five algorithms with different conceptual underpinnings are used to classify each message

Voting scheme is then applied

Das and ChenDas and Chen Three supplementary databases

Dictionary – nature of the word, noun adjective, adverb.

Lexicon - collection of hand picked words which form variables for statistical inference within the algorithms

Grammar – training corpus of base messages used in determining in-sample statistical information. Applied for use on the out-of-sample messages

Lexicon and grammar jointly determine the context of the sentiment

Das and ChenDas and Chen Five algorithms: (=Classifiers)

1. Naïve classifier Based on word count of positive and negative

connotation words

2.Vector distance classifier Each of the D words in the lexicon is assigned a dimension

in vector space Each training message is pre classified as positive,

negative or neutral Each new message is classified by comparison to the

cluster of pre trained vectors and is assigned the same classification as that vector with which it has the smallest angle

Das and ChenDas and Chen

3. Discriminant based classifier NC weights all words within the lexicon equally. The

discriminant based classification method replaces this simple word count with a weighted word count.

The weights determine how well a particular lexicon word discriminates between the different message categories

4.Adjective-adverb phrase classifier This is based on the assumption that phrases which use

adjectives and adverbs emphasize sentiment and require greater weight.

Uses a word count but uses only those words within phrases containing adjectives and adverbs.

Das and ChenDas and Chen5.Bayesian classifier

Given the class of each message in the training set we can determine the frequency with which a lexical word appears in a particular class.

For a new message we are able to compute the probability it falls within a particular class given its component lexicon words

The message is classified as being from the category with the highest probability.

Voting scheme … final classification based on achieving majority amongst classifiers Reduces number of messages classified Enhances classification accuracy

Das and ChenDas and Chen Ambiguity - stock message boards messages often

highly ambiguous Use General Inquirer … determine optimism score Filter in and consider only most highly optimistic

stories in positive category Filter in and consider only the most highly pessimistic

scores in the negative category Number of false positive in classification declines

Disagreement – 0 no disagreement; 1 high disagreement

Das and ChenDas and Chen Relationship between sentiment indices and

market variables ? Nature of sentiment index? Positive sentiment bias

Fig shows histogram of normalised sentiment for a stock…positively skewed

RavenPack find positive bias in classifiers … more marked in bull markets

Das and ChenDas and Chen Relationship between sentiment indices and

market variables Sentiment and stock levels – are related …

determining precise nature of price relationship is difficult

Sentiment inversely related to disagreement Disgreement rises, sentiment falls

Sentiment correlated to posting volume Discussion increases, indicates optimism about stock is

rising Strong relationship between message volume and

volatility (Antweiler and Frank (2004) also) Strong relationship between trading volume and

volatility

OutlineOutline Introduction

News data Data sources Pre analysis of data

Determining sentiment scores General overview Das and Chen Lo

Models and applications in summary form (abnormal) Returns Volatility and risk control

Desirable industry applications

Summary and discussions

LoLo Reuters NewsScope Event Indices (NEI) are

constructed to have predictive power for returns and realised

volatility integrated framework, returns and volatility used in

calibrating indices News data

Reuters newsalerts -quick news flashes issued when newsworthy events occur – timely and relevant

Tags machine readable Headlines concise, small vocabulary…good for machine

learning analysis

LoLo The following parameters are used

List of keywords and phrases with real valued weights

A rolling “sentiment window” of size r (say 5/10 minutes)

A rolling calibration window of size R (say 90 days) is the vector of keyword frequencies over

Raw score is defined as

this will tend to be high when news volume is high …normalised score

LoLo Normalised score

At all times t in R days of calibration window record raw score news volume;

Normalised score determined by comparing current raw score against raw scores where news volume equals current news volume

St =0.92: 92 % of time news volume is at current level, the raw score is less than it currently is.

LoLo Model calibration

Determine keywords Create list of keywords by hand Tool to extract news from periods when scores are

high… determine whether keywords are legitimate or need adjusting

Optimal weights for intraday return sentiment index regress word frequencies against intraday returns

Optimal weights for intraday volatility sentiment index regress word frequencies against (deseasonalised)

intraday realised volatility

LoLo Model calibration

Determining optimal weights more general classification problem

Other techniques…machine learning…perceptron algorithm, support vector machines…

LoLo Index validation – to establish empirical significance of

indices… event study analysis Event is defined when (return/volatility sentiment) index

exceeds a threshold value (0.995) Remove events that follow in less than one hour of

another event … consider only “new” events Tests null hypothesis: Distribution of returns /

deseasonalised realised volatility is the same before / after an event. Visual inspection t –test for equality of means Levene’s test for change in standard deviation Chi – squared goodness of fit

LoLo Index validation – to establish empirical

significance of indices… event study analysis

LoLo Index validation – to establish empirical

significance of indices… event study analysis

RavenPack Sentiment ScoresRavenPack Sentiment Scores

Reuters NewsScope Sentiment Reuters NewsScope Sentiment EngineEngine

OutlineOutline Introduction

News data Data sources Pre analysis of data

Determining sentiment scores General overview Das and Chen Lo

Models and applications in summary form (abnormal) Returns Volatility and risk control

Desirable industry applications

Summary and discussions

Average Stock Price Reaction to Negative News EventsAverage Stock Price Reaction to Negative News Events

Source: Macquarie Quant Research –May 2009

Model & Applications… (abnormal ) Model & Applications… (abnormal ) ReturnsReturns

Average Stock Price Reaction to Positive News EventsAverage Stock Price Reaction to Positive News Events

Source: Macquarie Quant Research –May 2009

Model & Applications… (abnormal ) Model & Applications… (abnormal ) ReturnsReturns

Model & Applications… (abnormal ) Model & Applications… (abnormal ) ReturnsReturns

Traders and quant managers … identify and exploit asset mispricings before they correct … generate alpha

News data can be used

Stock picking and generating trading signal

Factor models

Exploit behavioural biases in investor decisions

Model & Applications… (abnormal ) Model & Applications… (abnormal ) ReturnsReturns

Stock picking and generating trading signal

Li (2006) simple ranking procedure … identify stocks with positive and negative sentiment 10 K SEC filings for non-financial firms 1994 – 2005 Risk sentiment measure – count number of times

wordsrisk, risks, risky, uncertain, uncertainty and uncertaintiesappear in management discussion and analysis section

Strategy long in low risk sentiment stocks short in high risk sentiment stocks … reasonable level returns

Leinweber (2010) – event studies based on Reuters NewsScope Sentiment Engine

Model & Applications… (abnormal ) Model & Applications… (abnormal ) ReturnsReturns

Factor models

CAPM (Sharpe 1964; Lintner 1965), APT (Ross 1976) …additional sources of information to market

“Profits may be viewed as the economic rents which accrue to [the] competitive advantage of … superior information, superior technology, financial innovation” (Lo )

Tetlock, Saar-Tsechansky and Mackassy (2008)

Investors’ perception … determined from… their “information sets”

Model & Applications… (abnormal ) Model & Applications… (abnormal ) ReturnsReturns Factor models

“Information sets”1. analysts forecasts,2. quantifiable publicly disclosed accounting variables3. linguistic descriptions of firm’s current and future

profit generating activitiesIf 1. and 2. are incomplete or biased, 3. may give relevant information

MacQuarie Report Cahan et. al., News sentiment data in a multifactor models.

Results are positive … such an approach does add value. In particular they note the value of this source of

information during the credit crisis, when determining fundamentals (which traditional quant factors are based on) was problematic.

Model & Applications… (abnormal ) Model & Applications… (abnormal ) ReturnsReturns

Behavioural biases

Behavioural economists challenge the assumption that markets act rationally … EMH AMH ( Lo )

Propose individuals display certain biased behaviour

Due to biases they systematically deviate from optimal (rational) trading behaviour

Use behavioural biases to explain (abnormal) returns, rather than risk based explanations.

Model & Applications… (abnormal ) Model & Applications… (abnormal ) ReturnsReturns

Behavioural biases

Odean and Barber (2007) find evidence individual investors have a tendency to buy attention grabbing stocks.

Professional investors better equipped to assess a wider range of stocks they are less prone to buying attention grabbing stocks

Da, Engleberg and Gao also consider how the amount of attention a stock received affects its cross-section of returns.

Use the frequency of Google searches for a particular company as a measure of attention.

Find some evidence that changes in investor attention can predict the cross-section of returns.

Model & Applications… (abnormal ) Model & Applications… (abnormal ) ReturnsReturns

Behavioural biases

Chan (2003) finds stocks with major public news exhibit momentum over the following month.

In contrast stocks with large price movements, but an absence of news, tend to show return reversals in the following month.

This would support a trading strategy based on momentum reinforced with news signals.

Moniz et. al. (2009) finds a strategy based on earnings momentum reinforced by newsflow is effective.

OutlineOutline Introduction

News data Data sources Pre analysis of data

Determining sentiment scores General overview Das and Chen Lo

Models and applications in summary form (abnormal) Returns Volatility and risk control

Desirable industry applications

Summary and discussions

Applications: Risk managementApplications: Risk management Traditionally historic asset price data has been

used to estimate risk measures. ex post retrospective measures fail to account for developments in the market

environment, investor sentiment and knowledge

Significant changes in the market environment Traditional measures can fail to capture the true level

of risk(Mitra, Mitra and diBartolomeo 2009; diBartolomeo and Warrick 2005)

Incorporating measures or observations of the market environment in risk estimation is important

Applications: Risk managementApplications: Risk management The risk structure of assets may change over

time

Patton and Verardo find news impacts beta of stocks and in particular most of beta increase comes from rising covariance, suggesting there is contagion in information content of news releases.

Applications: Risk managementApplications: Risk management Relationship between information release and

volatility widely reported

Ederington and Lee (1993) macro economic announcements and foreign exchange and interest rate futures

Stock message board activity is a good predictor of volatility Antweiler and Frank (2004); Wysocki (1999)

GARCH model with news inputsKalev et al. (2004); Robertson, Geva and Wolff (2007)

OutlineOutline Introduction

News data Data sources Pre analysis of data

Determining sentiment scores General overview Das and Chen Lo

Models and applications in summary form (abnormal) Returns Volatility and risk control

Desirable industry applications

Summary and discussions

Desirable Industry ApplicationsDesirable Industry Applications

1. Enhanced Strategies ( Asset Management)

Low Frequency Portfolio (rebalancing) early trigger based on “draw down” rules/risk.

High Frequency

• Trading “wish to” trade signals.

• Trading “have to/need to trade sell and buy” signals.

• News analytics market views taken into consideration for the “optimal trade execution” algorithms.

{ VWAP, Almgren & Chriss, Lo & Bertsimas }

2. Risk Control and Compliance.

improved short term risk estimate.

Enhanced downside risk estimate;(improving scenario generators by using sentiment scores).

???

Wolf Detection;Signal to stop trading in a specific stock/asset.

Desirable Industry ApplicationsDesirable Industry Applications

3. Post trade analysis (reporting).

4. Refine fundamental research ( results /figures)

5. Use by regulator/public body (government treasuries) to take a prior view of the “impact” of (economic and other) announcements

Desirable Industry ApplicationsDesirable Industry Applications

OutlineOutline Introduction

News data Data sources Pre analysis of data

Determining sentiment scores General overview Das and Chen Lo

Models and applications in summary form (abnormal) Returns Volatility and risk control

Desirable industry applications

Summary and discussions

Summary & discussionsSummary & discussions Applications of (semi-)automated news

analytics in finance are growing in importance.

Pay back can be substantial to:

Investment Managers

Traders

Internal Risk Auditors

Regulators

Knowledge and Skills from three different disciplines:

Information Systems.

Artificial Intelligence.

Financial Engineering & quantitative modelling(including behavioural finance).

are required in various degrees to progress the field/make substantial impact.

Summary & discussionsSummary & discussions

THANK YOU FOR YOUR ATTENTION …ANY THANK YOU FOR YOUR ATTENTION …ANY QUESTIONS…?QUESTIONS…?