Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Big data, news, and economics
Leif Anders Thorsrud
BI and Norges Bank
April 2019
Disclaimer This work should not be reported as representing the views of Norges Bank. The views expressed are those of the authors and do not necessarily reflect those ofNorges Bank.
1 / 24
Framing expectations
Today’s topic:
“Kunstig intelligens/data science fra A til Å”
What is Artificial Intelligence (AI)?
Combining potentially many Machine Learning (ML) algorithms tosolve complex problems (Taddy (2018))
This presentation:
Will focus on the problems addressed in Norges Bank’s UnstructuredBig Data Project, and why AI/ML was useful
Will not focus on the technicalities (ML/IT)
2 / 24
Core questions in (macro)economics
What is the state of theeconomy now?
What’s causing businesscycles?
How are expectationsformed?
Why and how is mone-tary policy communica-tion important?
3 / 24
How to answer: Economics in one slide
y = E(x|I)
y = OutcomesE = Expectationsx = “the future”
I = Information
In words:Outcomes today are a function of our expectations about the future given theinformation we have today
4 / 24
Textbook economics
Event Information flow Choices and outcomes
Full Information (I)Rational Expectations(FIRE)
y = E(x|I)
5 / 24
Reality
Event Information flow Choices and outcomes
Textbook economics:Full Information (I)Rational Expectations(FIRE)
y = E(x|I)
Real life: The ether mat-ters
People read the news to getinformation
y = E(x|I(media))
6 / 24
Challenges and opportunities
Our challenge: Quantify news
News and media data is often textual
This is Big Data: many words and articles, and highly unstructured
Need AI, but algorithms and data required are new to economists
7 / 24
Challenges and opportunities
Norges Bank solution: Unstructured Big Data Project (UBDP)
Broad mandate: Investigate potential usefulness of “Text as Data"
Identify/document its benefits over conventional economic data
Speak to economic theory and the academic literature
Managed and run by Norges Bank Research department
Started late 2016
8 / 24
In a nutshell: Do data sources like these...
+
ñ
vlèRlól!f_
5lo
ll.ålã
vloIEIl\
-l-rl¡Il,ll_E
t*
ñlotsI-
Ele
ñl(Utbl-
.lLt6=t|ìt¡l Inl
ñ
g_Ëþ*
flEIt!EoÐEra
.e. ãE
+E
q-G
ñ-- l!u-c,T!Q=
=bÈ l!oÉ¡
oal\
ñF-
6
oao
+ooorôf¡l
Fl
(,NoÊq
LIdF-tc,- L aoILfEao
- b
ñ
L{oÉ
rcE
FI
HB
ËR
ndE
.ü¡-{H
.9P(/) õ
BË
E#
çn O.
ãEboo'E
l .'tr'iã.'t=
å.È8çË
iü(uãE
{o"H
EE
'õE
EtrEoro>!
oaE
i-lI
.ÉtrorlIFI
-lIHgflooraHoT-lIfún
á
¡a!ÊÉl¡
èR
àR
>ES
goo
I
ñæ
èRN èRo
FoEbt!-o-o,--
*N
Ë.r!(¡)tt(u
.GÞo
c,o69p
EtJJ q¡ÊoI,l
LÊÐ-o6Le
.LolâobÐoLèl'ol'
GtroIl,lEoÊoJEIttrL
€Et!aÃ|IË.
|!Jl'ol'oIttr-l!t
- L€,u0- h¡I
Ft
tlÊt!Êr
ÉL
o¡ËrÞ
.
))-¡
èR
èSS
N
èslg--fl-
l-.
ËF
Èt\
fi*O
Rg@
lrl c_)
cf
èRo
èRo
6otc¡Èü
!¡ILCL
t^-LoËUI
FåÈ.gIrtn)5E(t)
(o(u
oaboo:
.9ÊÀfloT¡aorL.r!ta
È.Fl
Þ {
àlÞ.:àeosñèe
sNoÈ
9oo
tCIrâ+#
ì)
ì
ìIfì
-(D-rD
\l
LJIO(-\!-a)tìO1)\,/
O$-oorU-U
C
l)ru'=n
rôNôO
rghn=
ICoo
4)aaN
ú)gÉ
r-a
z:/ra!ôn-$rc'õg
Yl *o
) __J
ôl-\
...contain useful information for answering ourquestions?
9 / 24
In a nutshell: Do data sources like these...
+
ñ
vlèRlól!f_
5lo
ll.ålã
vloIEIl\
-l-rl¡Il,ll_E
t*
ñlotsI-
Ele
ñl(Utbl-
.lLt6=t|ìt¡l Inl
ñ
g_Ëþ*
flEIt!EoÐEra
.e. ãE
+E
q-G
ñ-- l!u-c,T!Q=
=bÈ l!oÉ¡
oal\
ñF-
6
oao
+ooorôf¡l
Fl
(,NoÊq
LIdF-tc,- L aoILfEao
- b
ñ
L{oÉ
rcE
FI
HB
ËR
ndE
.ü¡-{H
.9P(/) õ
BË
E#
çn O.
ãEboo'E
l .'tr'iã.'t=
å.È8çË
iü(uãE
{o"H
EE
'õE
EtrEoro>!
oaE
i-lI
.ÉtrorlIFI
-lIHgflooraHoT-lIfún
á
¡a!ÊÉl¡
èR
àR
>ES
goo
I
ñæ
èRN èRo
FoEbt!-o-o,--
*N
Ë.r!(¡)tt(u
.GÞo
c,o69p
EtJJ q¡ÊoI,l
LÊÐ-o6Le
.LolâobÐoLèl'ol'
GtroIl,lEoÊoJEIttrL
€Et!aÃ|IË.
|!Jl'ol'oIttr-l!t
- L€,u0- h¡I
Ft
tlÊt!Êr
ÉL
o¡ËrÞ
.
))-¡
èR
èSS
N
èslg--fl-
l-.
ËF
Èt\
fi*O
Rg@
lrl c_)
cf
èRo
èRo
6otc¡Èü
!¡ILCL
t^-LoËUI
FåÈ.gIrtn)5E(t)
(o(u
oaboo:
.9ÊÀfloT¡aorL.r!ta
È.Fl
Þ {
àlÞ.:àeosñèe
sNoÈ
9oo
tCIrâ+#
ì)
ì
ìIfì
-(D-rD
\l
LJIO(-\!-a)tìO1)\,/
O$-oorU-U
C
l)ru'=n
rôNôO
rghn=
ICoo
4)aaN
ú)gÉ
r-a
z:/ra!ôn-$rc'õg
Yl *o
) __J
ôl-\
What is the state of the economynow?
What’s causing business cycles?
How are expectations formed?
Why and how is monetary policycommunication important?
...contain useful information for answering ourquestions?
10 / 24
Lessons learned
11 / 24
What is the state of the economy now?
A Newsy Coincident Index (NCI) for Norway (Thorsrud (2018))
High-frequency indicator of the business cycleLike a daily survey (but much cheaper)
12 / 24
Text as data: Benefits,...
Potentially available at a high fre-quency
Potentially reflecting the broadereconomy
Financial data is high frequency,but NOT reflecting the broadereconomy
13 / 24
Text as data: Benefits,...
Potentially available at a high fre-quency
Potentially reflecting the broadereconomy
Potentially capturing economicrelevant concepts not measured byconventional hard economic data
Financial data is high frequency,but NOT reflecting the broadereconomy
E.g., politics, natural disasters,and uncertainty
14 / 24
Uncertainty and Brexit (Larsen (2017))Topic: EU
15 / 24
What’s causing business cycles: Fluctuations inuncertainty?
(EU) Uncertainty shock and macro responses:
16 / 24
Text as data: Benefits,...
Potentially available at a high fre-quency
Potentially reflecting the broadereconomy
Potentially capturing economicrelevant concepts not measured byconventional hard economic data
A number is a fact, but themedia in which it is pre-sented/discussed/opinionated addsto the information
Financial data is high frequency,but NOT reflecting the broadereconomy
E.g., politics, natural disasters,and uncertainty
I.e., there might be an indepen-dent (causal) media effect
17 / 24
How are expectations formed? (Larsen and Thorsrud(2017))
(1) Before strike (3) After strike(2) Strike periodNo media, but news
Average returns all firms
Average returns
r2,· Treatment group(“exposed to media")r1,· Control group (“notexposed to media")
Formally:∆ri,d−ba = δ + τ∆wi +∆uiwhere τ = ∆rm = ∆r2,s−∆r1,s with∆r2,s = r̄2,2− r̄2,1∆r1,s = r̄1,2− r̄1,1if r̄2,1 = r̄1,1∆rm = r̄2,2− r̄1,2
r̄0,1 r̄0,3
r̄0,2
∆rs = r̄0,2− r̄0,1 ≈−0.62
∆rs Strike effect
r̄2,1, r̄1,1 r̄2,3, r̄1,3
r̄1,2
r̄2,2
∆rm = r̄2,2− r̄1,2 ≈−0.57
∆rm Media effect
Back
18 / 24
Text as data: Benefits, theory, and some UBDP output
Potentially available at a high fre-quency
Potentially reflecting the broadereconomy
Potentially capturing economicrelevant concepts not measured byconventional hard economic data
A number is a fact, but themedia in which it is pre-sented/discussed/opinionated addsto the information
Intu
itive
bene
fits
News-driven/sentiment-drivenbusiness cycle view
Rational (in)attention theory andinformation rigidities
Narrative economics
Inth
eory
Norwegian data“Words are the new numbers: Anewsy coincident index of thebusiness cycle" (Thorsrud (2018))
“Components of Uncertainty"(Larsen (2017))
“Asset returns, news topic,and media effect" (Larsen andThorsrud (2017))
“The Value of News for Eco-nomic Developments" (Larsenand Thorsrud (2018b))
International data“News-driven inflation expecta-tions and information rigidities"(Larsen et al. (2019))
“Business cycle narratives"(Larsen and Thorsrud (2018a))
19 / 24
How (success factors)?Algorithms
Combined (close to) off the shelf Machine Learning algorithms fromthe Natural Language Processing literature with conventional tools usedin econometrics
Latent Dirichlet Allocation, Dynamic Factor Models, Latent ThresholdModels,...
Keywords: Dimension reduction, sparsity, and non-linearity
IT
All computations done in simple cloud environment. In-househardware not adequate:
“Small” computers, firewall/security issues, software restrictions
Keywords: Flexible and low cost for R&D
Dissemination
Internal (and external) courses and presentations on using “Text asdata”
20 / 24
Difficulties
Algorithms and data
Economists often care more about the story than the outcome...I.e.,often look for causal explanation rather than best prediction. ML/AIbetter, or mostly used, for the latter. Ongoing work to combine
Domain knowledge is important
Constructing the appropriate data sets might be difficult. Need to relyon external provider(s), which can be expensive, or construct ourself
Text data is abundant. (Macro)Economic data is scarce (relative totext):
Becomes an issue for training algorithms
Supervised versus unsupervised learning
21 / 24
Difficulties cont’d
From research to production?
New techniques and data require different skills: Need to train staff, orhire people with the right skills
And an interest in economics. Domain knowledge is important!
Ownership: ML/AI often black box (and more so for those that havenot done the development). Hard to get people to use stuff they do notunderstand
This is often a good thing, but also relates to a preference for “causalunderstanding”
Data and model management: Need a well functioning datamanagement/science team/infrastructure
22 / 24
Conclusions and (potential) advice
Combining potentially many Machine Learning algorithms to solve complexproblems works in economics too. Some general lessons/advice
Focus on the questions, then the tools (algorithms) needed to solve theproblem
Make sure the questions are of relevance for your business. “Need moreof what we do not have rather than more of what we already have”
Start small
E.g., pilot based on cloud infrastructure might be sensible when output isuncertain
Expect resistance
Successful (eventual) implementation in production requires that peopleoutside the project group “understands” the work being done
Value domain knowledge
23 / 24
References I
Larsen, V. H. (2017, April). Components of uncertainty. Working Paper2017/5, Norges Bank.
Larsen, V. H. and L. A. Thorsrud (2017). Asset returns, news topics, andmedia effects. Working Paper 2017/17, Norges Bank.
Larsen, V. H. and L. A. Thorsrud (2018a). Business Cycle Narratives.Working Paper 2018/03, Norges Bank.
Larsen, V. H. and L. A. Thorsrud (2018b). The Value of News for EconomicDevelopments. Journal of Econometrics (Forthcoming).
Larsen, V. H., L. A. Thorsrud, and J. Zhulanova (2019, February).News-driven inflation expectations and information rigidities. WorkingPaper 2019/5, Norges Bank.
Taddy, M. (2018, January). The Technological Elements of ArtificialIntelligence. University of Chicago Press.
Thorsrud, L. A. (2018). Words are the new numbers: A newsy coincidentindex of the business cycle. Journal of Business & Economic Statistics(Forthcoming).
24 / 24