Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
How News and Its Context Drive Risk and Returns
Around the World
Charles W. Calomiris and Harry Mamaysky
Columbia Business School
1
Introduction
• Automated processing of natural language is opening a previously unavailable window into market behavior
• It may fundamentally transform finance practice
• Prior work has been very short-term focused
• But isn’t news (in aggregate) important for longer horizon outcomes?
• We look at
• Longer term country-level risk and return responses to news
• How to measure news at the country level?
2
Our approach and a peak at findings…• We develop a theory-neutral approach to map country news into market
outcomes, which measures word flow and examines connections of word flow to risk and return.
• We apply this (for the first time, we think) outside the U.S., to 52 countries.
• EMs vs. DMs treated separately, given differences in returns processes.
Key Findings:
1. Many measures relevant (sentiment, frequency, entropy), EMs/DMs differ.
2. Topical context matters.
3. Results change over time importantly.
4. News generally has opposite implications for return and risk.
5. Drawdown is useful as a measure of risk, especially for EMs.
6. We capture more than a popular a priori measure, in and out of sample.
3
1. Theory-neutral vs. a priori word identifiers
What word flow?
• Theory-neutral vs. a priori approaches (Baker Bloom Davis 2016)
• Theory-neutral does not require advance knowledge of what is important, and avoids data mining risks.
• But is it possible to construct a comprehensive, parsimonious, and flexible theory-neutral model of word flow?
4
2. What aspects of news are important?
• Sentiment
• Frequency
• Unusualness (entropy)
• Interact sentiment and entropy (Glasserman and Mamaysky 2016)
• Topical context interacted with above• How are topics different from EM to DM?
• How does effect of news, and interpretation of news, differ by topic?
5
3. Regime changes over time?
• Principal components indicate shift point around Global Crisis
• A priori shift point lines up with second principal component
• Out of sample properties of forecasting in light of this change
6
4. How to identify topical context?
• Identifying topic-relevant words and their characteristics
• Louvain method vs. LDA
• Computational difficulty of LDA (pilot study comparison)
7
5. Is all news relevant for both returns and risk?
• Allow our measures to affect both returns and risk and see whether effects tend to be opposite, or unrelated.
• Will we find opposite signs when an effect is statistically significant for return, if it is also statistically significant for sigmaor drawdown?
8
6. How to measure risk?
• Especially in EMs, returns are not normal and there is momentum in returns.
• In addition to sigma, we use drawdown (which allows longer term effects from momentum, skew, and kurtosis to be expressed).
9
7. How to analyze countries, together or not?
• Advantages to panel.
• Disadvantages from pooling (if processes are different)
• We separate EMs and DMs and analyze each as a panel.
10
8. What news source?
• Thomson-Reuters provides a common platform, English language, and large sample of relevant countries, for which there are other data on returns and on various relevant variables.
11
Data construction
• Data cleaning
• Financial economics word list (Beim-Calomiris plus)
• Topics (Louvain groups), EM vs. DM topic groups
• Sentiment (Loughran-McDonald)
• Entropy (4-grams)
• Context-specific measures of sentiment, frequency, entropy
12
Text measures definedData
• Thomson-Reuters digital news archive from 1996—2015
• 5mm EM and 12mm DM articles
• 52 countries (list next page)
Text measures:
• artcount – number of articles per country per month
• entropy – “unusualness” of an article j (Glasserman and Mamaysky 2016)
𝐻𝑗 = −
𝑖 ∈ {4−grams}
𝑝𝑖 log𝑚𝑖
• Effectively the average log probability of a word conditional on preceding words
• sentiment – the difference of positive and negative words divided by total words in article j:
𝑠𝑗 =𝑃𝑂𝑆𝑗 − 𝑁𝐸𝐺𝑗
𝑎𝑗
• Word sentiment comes from Loughran – McDonald dictionary13
Topics
Intuition: Find groups of words that co-occur together in articles
Details:• 1240 econ words
• Start w/ 237 words from index of Beim and Calomiris (2001) and find other words, bigrams and trigrams from EM corpus based on cosine similarity
• E.g.: barriers, currency, parliament, macroeconomist, and World Bank
• We have 2 document term matrixes:
• 5mm x 1,240 for EM and 12mm x 1,240 for DM
• Compute cosine similarity matrix (1,240 x 1,240)
• Then do community detection (using Louvain method for modularity maximization)
• Out topics are mutually exclusive (not necessary)
14
We find 5 topics for each group of countries
• The Louvain algorithm returns ~40 word clusters with the following numbers of words
• Place words from small clusters into big clusters
15
Topics for EMs
16
Topics for DMs
17
Context specific sentiment
• Let 𝑓𝜏,𝑗 be the fraction of econ words in article j that are about topic τ
• Topic sentiment is given by: 𝑠𝜏,𝑗 = 𝑓𝜏,𝑗 × 𝑠𝑗
• Aggregate the article level measures into daily measures (weighted by number of overall words in an article)
For a given country, we have 12 daily text measures:• entropy• article count• sMkt / fMkt• sGovt / fGovt• sCorp / fCorp• sComms / fComms• DM/EM specific:
• sMacro / fMacro (EM)• sCredit / fCredit (DM)
18
Principal Components EM
19
EM Sentiment
• For 140 EM sentiment series (28 countries x 5) we look at first 2 principal components
• PC2 – relative sentiment of Markets to Government
• Some evidence of a regime shift in PC2 a little before the financial crisis
Principal components EM
20
DM Sentiment
• For 120 DM sentiment series (24 countries x 5) we look at first 2 principal components
• PC2 – relative sentiment of Markets to Government (again!)
• Some evidence of a regime shift in PC2 a little before the financial crisis
Event Studies
• High-frequency top and bottom deciles of sentiment
• Middle as placebo
• Returns lead major sentiment indicators at high frequency
• Some post-event drift for positive and negative events
21
Event studies – EM
• Cumulative abnormal return around deciles of daily news events
• Middle column is control for boring news
• Some topics show post event drift: Mkt (both), Comms (negative)
• This is very differentfrom single name results, where there is little evidence of drift post negative news (only post positive)!
22
Event studies – DM
• Some topics show post event drift: Mkt(negative, both?), Corp(positive), Credit (both)
23
Regression results
• We run panel regression with dependent variables given by
• return
• return12
• sigma
• drawdown
• We control for many variables that have been shown to have some forecasting power for future returns (next page)
• The no-text measure regression is our Baseline model
• All text measures (except entropy) are normalized to unit variance
• We run full sample, 1st and 2nd half of the sample
24
Control variables
25
Summary of regression results
• News matters for EM and DM!
• Results differ across EM and DM (e.g., artcount matters in EMs)• Baseline R2 lower for EM
• % increase in R2 from text measures larger for EM
• Sign of effects (i.e. good news or bad) almost always is consistent across return, sigma, and drawdown
• Context matters: positive sentiment in Govt, Corp – bad news; positive sentiment in Mkt – good news
• Incremental explanatory power largest for return12 and drawdown; explanatory power lower for return and sigma
• Evidence of state dependence, especially for entropy• Goes from a “bad” pre-crisis to a “good” post-crisis
26
Summary of regression results
27
28
29
Out-of-sample testing
• Do we have too many explanatory variables?
• What about regime shifts?
• Check out-of-sample forecasting performance
• Run rolling 5-year regressions in t-60,…,t-1 for forecasting month toutcomes
Lasso (least absolute shrinkage and selection operator)
min𝛽
1
2𝑁
1,𝑡
𝑦𝑖,𝑡 − 𝑥𝑖,𝑡−1′ 𝛽
2+ 𝜆 𝛽 1
• Lasso does shrinkage and model selection• Amount of shrinkage given by 𝛽 1/ 𝛽
𝑂𝐿𝑆1
30
When the model has little to say
31
Rolling lasso for DM drawdown
32
Rolling lasso for EM drawdown
33
Out-of-sample performance
• Naïve model forecasts using country fixed effects
• Base model includes only the non-text variables
• CM includes all text measures
• All models estimated using lasso
34
Out-of-sample comparisons to EPU
• EPU counts articles from 10 major papers that contain triplets from
uncertainty x
economic x
{policy terms}
• For 5 EM and 11 DM countries where we have EPU data, compare out-of-sample performance of Base vs Base + alternative text measures
35
Conclusions
• Useful information in text for medium-term country-level outcomes
• Different dimensions of text matter• In particular, context
• Effects differ across EM and DM, and over time
• Evidence of out-of-sample forecasting ability
• Next:• Currency effects?• Trading strategies?
36