1
Evolution of Financial Studies Over Forty Years:
What Can We Learn from Machine Learning?
Abstract
How did the finance research topics evolve in the past forty years? We apply
machine learning models of textual analysis on 20,185 abstracts of finance articles
published between 1976 and 2015, and identify 38 research topics. We present the fastest
growing topics of published and working papers. Our algorithm can be used to categorize
the articles without JEL codes. We use citation network to present how topics are related,
and cluster the topics in five “territories”. Moreover, we find a strong bibliometric
regularity: the number of researchers covering n topics is approximately 1/2𝑛 of those
covering just one topic.
JEL Classification: G00, G10, G20, G30, B26
Keywords: Textual Analysis, Machine Learning, Network Analysis, Evolution of
Financial Studies
2
1. Introduction
Finance researchers are interested in knowing what other finance researchers work on,
but not every researcher has a full picture of all topics in finance research and their connections.
Therefore, an analysis of all academic publications in finance may be beneficial to those who
desire to have an overview of this academic profession and inspire more cross-topic research.
How did the finance research topics evolve in the past forty years? Were there any topics
popular decades ago but are unattractive today? Which topics attracted the most attention in
the recent decade? To answer these questions, we need to 1) construct a comprehensive sample
that contains most of the published finance articles in the last 40 years, and 2) determine which
topic each article belongs to.
To address the first requirement, we collect information of 20,185 academic articles
published on 17 finance journals between 1976 and 2015. To determine each article’s topic, we
need to read each article, summarize a list of topics that the articles mainly cover, and finally
classify each article’s topic. Human reading is not only time-consuming but also constrained
by the reader’s comprehensibility. Instead, we employ textual analysis techniques to analyze
the literature.
Textual analysis has been used in finance literature to process textual data of media news
(e.g. Tetlock, 2007; Tetlock, Saar‐Tsechansky, Macskassy, 2008), financial disclosures (e.g.
Loughran and McDonald, 2011; Loughran and McDonald, 2014), Form S-1 on IPO SEC filings
(e.g. Loughran and McDonald, 2013), product descriptions (e.g. Hoberg and Phillips, 2010)
and etc. As far as we know, the textual analysis has not been applied to the finance academic
research itself to categorize the topics and analyze the connections between papers.
We apply two popular unsupervised machine learning algorithms of textual analysis,
3
latent Dirichlet allocation1 (LDA) and dynamic topic model (DTM) to determine 1) the number
of topics there are in the finance literature; 2) the topics that each article focuses on; 3) which
topics grew most and declined most in recent years; and 4) the evolution of specific interest
within each topic - for example, in the banking area, have researchers been more interested in
banking networks recently?
It is natural to think of JEL classification codes to tell which topic that an article belongs
to. However, there are three reasons that JEL codes are not enough for the analysis of historical
trend. First, some journals do not provide JEL codes, such as The Journal of Finance (JF) and
Journal of Financial and Quantitative Analysis (JFQA). Second, although some journals
provide JEL codes today, they did not provide them in early years. For example, Review of
Financial Studies (RFS) had its first volume in 1988 but started providing JEL codes in 20072.
Therefore, it is difficult to analyze the early articles and the historical trend. After searching
over Web of Science, ScienceDirect, JSTOR, every journal’s official website, and each
published article’s working paper record, there are still 65.2% of all articles without JEL codes.
The phenomenon of missing JEL codes was more common in early years. 79.6% (or 87.1%,
98.2%) of all articles on Journal of Financial Economics (JFE), JF and RFS before 2005 (or
2000, 1995) do not have JEL codes3. JFE is the earliest providing JEL codes among the above
three journals, starting from 1994.
Third, the JEL codes are self-reported and they often change. After comparing the JEL
codes of published articles and their last version of working paper before publication, we find
1 As of January 1, 2018, the LDA article by Blei, Ng, and Jordan (2003) has been cited 21,464 times on
Google Scholar.
2 It’s similar in other journals. For example, Journal of Banking and Finance had its first volume in
1977 but started providing JEL codes in 1993; Journal of Futures Markets had its first volume in
1981 but only some of its articles started providing JEL codes in 2013.
3 73.9% (or 82.0%, 96.5%) of all published articles before 2005 (or 2000, 1995) do not have JEL
codes.
4
that 31.14% of articles changed JEL codes at least once. When we consider the different
versions of working papers, the percentage of change is even higher. The JEL codes are
subjective and there is little research discussing whether the authors’ classification is accurate.
In this research, we provide another way to obtain an objective classification by using the
unsupervised machine learning that minimizes the human input of prior knowledge. We find
our algorithm-computed topics of the articles and their self-reported JEL codes comparable.
Therefore, we are able to apply the machine learning algorithm on the articles without JEL
codes to determine their topics.
Our unsupervised machine learning algorithm reads all abstracts 4 and shows that
published research can be categorized into 38 topics5 . The largest topics include “Option
Pricing”, “Commercial Banking”, “CEO, Board, Director”, “Market Microstructure”, “Central
Bank, Monetary Policy”, and “Mergers and Acquisitions”. Besides traditional asset pricing and
corporate finance topics, we also identify topics such as “Social Network and Cultural Effect”
and “Venture Capital, Entrepreneurship”. We plot each topic’s historical publication number
and show the rise and fall over time. Publications on “Financial System, Banking Crisis” and
“Hedge Fund, Mutual Fund” increased the fastest in the past decade.
We also apply the LDA model trained from published papers on 130,547 working paper
abstracts that we obtained from SSRN Financial Economics Network. We find that working
papers on “Social Network and Cultural Effect”, “News, Analyst Report, Earnings
Announcement”, and “International Capital Markets” grew fastest from 2006 to 2015. “Market
4 We also used the full text of all articles as input of the model. Due to noisier information in the full
text compared to abstracts such as discussion of prior literature, the topics generated only using abstracts
are better categorized.
5 More rigorously, the published research is categorized into 50 topics including 12 general sentence
topics that do not indicate specific research interest. For example, a topic with keywords “relat”, “posit”,
“neg”, “associ” and “evid” may represent an often used general sentence “we provide evidences on a
positive/negative relation/association”. See Section 3.1 and 5.1 for detailed explanation.
5
Microstructure”, “Macro Finance”, and “Statistical Estimation Methodology” experienced the
greatest contraction during the same period.
The advantage of unsupervised machine learning over supervised machine learning is
its minimum need of human input. For example, the optimal number of topics is determined
by the algorithm6, not chosen by us. One of the few human involvements of the analysis is
naming the topics based on the keywords that the algorithm chooses to represent each topic.
The supervised machine learning needs humans to label the training sample so that the machine
can be “taught”. In the labelling process, there may be bias or even errors7. In contrast, the
unsupervised machine learning algorithm does not need labelled data8.
Using dynamic topic models, we present how specific research interests evolve. For
example, within the topic of “Determinants of Stock Return” there were many publications on
the January effect of stock prices9 before 1990. Since 2000, the January effect has not been on
the top list of researchers and research of momentum strategies and cross-sectional analysis
had become more popular.
The next question that we examine is how the topics are related to each other. To answer
6 As shown in Section 5.1 and Fig. 1, the optimal number of topics should be accompanied by the
highest computed log-likelihood of the data from the trained model.
7 There is another disadvantage of using supervised machine learning. When there are more data, the
researcher must provide more pre-labelled sample to train the model. Therefore, a supervised machine
learning algorithm requires human labelling every time the dataset changes. Since unsupervised
machine learning does not use pre-labelled sample, it does not have this disadvantage and can adapt to
other dataset easily.
8 It is also difficult to use the articles with JEL codes to train a supervised machine learning model,
and then apply it on the articles without JEL codes. The major problem is that the articles with JEL
codes are usually more recent, and the research topics and specific words used may be different from
early articles without JEL codes.
9 The January effect is a hypothesis that there is a seasonal increase in stock prices during the month of
January.
6
this question, we plot a citation network between topics and show that the research topics can
be largely grouped into five “territories”: asset pricing, corporate finance, market
microstructure, banking and macro finance, and “mixed areas”. From the network figure, we
easily see that the research of “Mergers and Acquisitions” is closer to “CEO, Board, Director”
compared to “Commercial Banking”.
Moreover, we find a strong bibliometric regularity: the number of researchers covering
n topics is approximately 1/2𝑛 of those covering one topic. Moreover, we find that on average
a published finance article covers fewer research topics over years, which indicates that the
published articles tend to become more focused than being broad.
Compared with prior related research that used no more than several thousand articles,
our sample is larger and more representative to the whole literature body. To the best of our
knowledge, it is among the first machine learning research of finance academic publication.
The rest of this article is organized as follows. Section 2 reviews the prior research on
academic profession. Section 3 discusses the sources and how we clean the textual data. Section
4 explains methodologies, mainly the two machine learning models - latent Dirichlet allocation
(LDA) and dynamic topic model (DTM). Section 5 presents our results and Section 6 concludes.
2. Literature Review
We believe that our research is among the first to study the evolution of research topics
in finance, but there is prior research about academic profession in general. The earliest works
include Froman (1952), Cleary and Edwards (1960), Henry and Burch (1974) and Klemkosky
and Tuttle (1977). Though methodologically simple, they provided important insights. For
example, Froman (1952) generated summary statistics of graduate students in economics
before the 1950s, presenting the institutions that granted the most degrees. Klemkosky and
Tuttle (1977) found that the University of Chicago, the University of Pennsylvania, Stanford
7
and UCLA contributed most to financial research and journal publication from 1966 to 1975.
Descriptive findings are also found in Heck, Cooley, and Hubbard (1986), Schwert (1993),
Niemi (1987) among others.
This field of research continued to emerge in the 1990s. Chung and Cox (1990) found
that in an academic journal, the number of researchers who published n articles is equal to 1/𝑛𝑐
of the number of researchers who publish just one article in this journal. They estimated that c
is approximately 2 for JF and JFE. Zivney and Bertin (1992) found that many researchers who
became productive later in their careers were incorrectly screened from tenure, while many
researchers who passed the mechanical screens ceased to publish following tenure. They argued
that simply knowing the number of publications and where the articles appeared is insufficient
for reliably predicting future research productivity. Alexander and Mabry (1994) ranked
journals according to the number of citations.
Borokhovich, Bricker and Simkins (1994) found that JF and JFE were the core
influences in finance research, most journals published in a variety of research areas but were
influential in a smaller number during their sample period. Borokhovich, Bricker, Brunarski
and Simkins (1995) found a skewed distribution of academic institutions’ influence; a relatively
small number of institutions contributed a majority of top journal publications and citation.
Corrado and Ferris (1997) investigated what kind of articles were used in doctoral
education. Swidler and Goldreyer (1998) concluded that top journal publication helps
researchers with promotion and salary increase. They estimated that the first top finance journal
publication provided the author with a then present value of between $19,493 and $33,754.
In more recent publications, Azoulay, Wang, and Zivin (2010) found a decline of
collaborators’ productivity following the premature death of an academic “superstar”. Brogaard,
Engelberg, and Parsons (2014) showed that editors’ personal connections help them screen
articles in the reviewing process. Welch (2014) finds that the referees: 1) differ in their scales
8
as some referees were intrinsically more generous than others, and 2) differ in their opinions
of what a good paper was as they often disagreed about the relative ordering of papers.
3. Data
As shown in Table 1, our sample consists 20,185 articles published on 17 academic
finance journals from 1976 to 2015. We obtain each article’s title, authors, affiliations, abstract,
full text, references, citations and publishing date from Web of Science, supplemented with
ScienceDirect10 , JSTOR and manual search. Table 1 lists the journals and their summary
statistics, including the first years that the abstracts start to exist in our sample. In this research,
we only use the articles’ abstracts in our models11. We have data of RFS from 1988, the year
of its first volume. JF was founded in 1946 and JFE had its first publication in 1974, but Web
of Science started storing these two journals’ data only from 1976. Moreover, Web of Science
stores the article abstracts of JF from 1991 and the article abstracts of JFQA from 1992. We
supplement the missing abstracts of JF between 1976 and 1990 and those of JFQA between
1984 and 1991 from JSTOR and manual search. Journal of Banking and Finance, Journal of
International Money and Finance, Journal of Money Credit and Banking, and JFQA are also
the largest contributors of articles in our sample.
Our sample does not contain the finance articles published on economics or accounting
journals. Many articles on these journals are not finance research. We do not selectively choose
some finance articles published in economics or accounting journals to supplement our sample
in order to avoid our subjective intervention in the algorithm’s analysis. But when we input all
10 ScienceDirect database is mainly used to get more detailed author names and abstracts of the articles
published by Elsevier. JF and JFQA are not published by Elsevier.
11 We also conducted the analysis using full text data. In categorizing the topics, the effect of using
abstracts is better than using full texts that contain noisier information such as the discussion of prior
literature.
9
articles published on economics or accounting journals, the algorithm generates many non-
finance topics because these journals have many non-finance articles. Therefore, we only use
the articles published on finance journals.
3.1. Textual Data Cleaning
This section describes the process of cleaning textual data and determining the
parameters of the models in a general way.
The textual data often contain commonly used but uninformative words such as “of”,
“you” or “that”. We generally follow the approach of Hansen, McMahon, and Prat (2014) to
clean the data. For each abstract, we
1. Tokenize the text into words, or tokens, with word tokenizer in the Natural
Language Toolkit (NLTK)12.
2. Remove tokens that are numbers or punctuation.
3. Remove tokens with length 1 such as “I”, “a”, “&” and etc.
4. Convert all tokens to lower case.
5. Remove stop words13, which are mainly English pronouns and auxiliary verbs
such as “you”, “your”, “yours”, “am”, “is”, “are”, “isn’t” and etc.
6. Stem the tokens with Porter Stemmer14, a popular stemming algorithm in the
Python library NLTK. Stemmers bring words with similar meanings to a common
linguistic root. For example, “manage”, “manager”, and “management” all become
12 NLTK is an open-source Python library for English natural language processing. See
http://www.nltk.org/ for more information.
13 In computing, stop words are words being filtered out before processing of natural language text,
which usually refer to the most common words. The list of stop words is at
http://snowball.tartarus.org/algorithms/english/stop.txt
14 For the details of Porter Stemmer, see https://tartarus.org/martin/PorterStemmer/ for more
information.
10
“manag” after stemming. We group words with similar meanings together by stemming,
which makes the final results more interpretable to humans.
7. Remove tokens appearing less than 5 times.
8. Combine the words that appear in a phrase at high frequency as one unit to
process. Appendix Table A.1 lists 53 phrases that we use. The most frequently appeared
phrases in our textual data are “interest rate”, “united states” and “exchange rate”.
4. Methodologies
We apply unsupervised machine learning models on the textual data to categorize
unobserved topics. We first obtain each abstract’s probability distribution over topics and each
topic’s probability distribution over words using LDA (Blei, Ng, and Jordan, 2003). Compared
to LDA, DTM (Blei and Lafferty, 2006) considers an additional dimension time. We then
observe how each topic’s probability distribution over words evolves over time from DTM,
and furthermore the evolution of word usage in each topic.
Intuitively speaking, LDA categorizes all abstracts into a number of topics. Moreover, it
can analyze an abstract’s quantitative distribution on different topics. For example, LDA may
find that an abstract, for instance Laeven and Levine (2009), is 12.7% on “Systematic Risk and
Risk Premium”, 11% on “Shareholder Right, Ownership Structure”, 10.3% on “Commercial
Banking” and 10.1% on “Financial Regulation”. The rest of percentages are distributed over
other topics. Within a topic, DTM can analyze the evolution of specific interests over time.
The following subsections address the basic concepts of LDA and DTM and how we
apply them to the textual data.
4.1. Latent Dirichlet Allocation
A collection of M abstracts is denoted by 𝐷 = {𝑤1, 𝑤2, … , 𝑤𝑀}, and each abstract d
11
with 𝑁𝑑 words is denoted by 𝑤𝑑 = {𝑤𝑑,1, 𝑤𝑑,2, … , 𝑤𝑑,𝑁𝑑} . The model assumes that text is
generated by unobserved variables 𝛽 and 𝜃 that are to be estimated. Let V denote the number
of unique words across all abstracts, and K denote the number of topics. 𝛽𝑘 is a V-dimension
vector over V words for topic k. 𝛽𝑘,𝑣, the vth element in 𝛽𝑘, represents the appearing probability
of word v given topic k. 𝜃𝑑 is a K-dimension vector of probabilities over K topics for abstract
d. 𝜃𝑑,𝑘, the kth element in 𝜃𝑑, represents the percentage distribution of topic k in abstract d.
LDA assumes that the abstracts are generated in the following process. To generate the
nth word in abstract d, a topic 𝑧𝑑,𝑛 is sampled from the probability vector 𝜃𝑑. With the given
topic 𝑧𝑑,𝑛, a word 𝑤𝑑,𝑛 is sampled from the distribution over 𝛽𝑍𝑑,𝑛. The model assumes that
each word in each abstract in the corpus is generated through this process. Therefore, the
probability of a given corpus D generated through this process is
𝑃𝑟(𝐷|𝜃, 𝛽) = ∏ ∏ ∑ 𝑃𝑟(𝑧𝑑,𝑛|𝜃𝑑) 𝑃𝑟(𝑤𝑑,𝑛|𝛽𝑧𝑑,𝑛)
𝑧𝑑,𝑛
𝑁𝑑
𝑛=1
𝑀
𝑑=1
(1)
where 𝑃𝑟(𝑧𝑑,𝑛|𝜃𝑑) is the probability of topic 𝑧𝑑,𝑛 given abstract 𝑑’s topic composition
𝜃𝑑 , and 𝑃𝑟(𝑤𝑑,𝑛|𝛽𝑧𝑑,𝑛) is the probability of word 𝑤𝑑𝑛 given topic 𝑧𝑑,𝑛’s word composition
𝛽𝑧𝑑,𝑛. The summation of the product of the two probabilities is the probability of each word
∑ 𝑃𝑟(𝑧𝑑,𝑛|𝜃𝑑) 𝑃𝑟(𝑤𝑑,𝑛|𝛽𝑧𝑑,𝑛)𝑧𝑑,𝑛, which is a summation of conditional probabilities on each
topic. The total probability 𝑃𝑟(𝐷|𝜃, 𝛽) is the product of each word’s probability.
We use the following example to illustrate how the above formula works on
hypothetical abstracts and parameters. The hypothetical abstracts are only for explanatory
purposes and are not from real articles. Suppose our collection of abstracts 𝐷 contains 2
abstracts 𝑤1and 𝑤2, where 𝑀 is 2, and
𝑤1: “𝐵𝑎𝑛𝑘𝑖𝑛𝑔 𝑖𝑠 𝑐𝑟𝑢𝑐𝑖𝑎𝑙 𝑡𝑜 𝑒𝑛𝑡𝑟𝑒𝑝𝑟𝑒𝑛𝑒𝑢𝑟𝑠ℎ𝑖𝑝. ”
𝑤2: “𝐵𝑎𝑛𝑘𝑖𝑛𝑔 𝑖𝑠 𝑐𝑟𝑢𝑐𝑖𝑎𝑙 𝑡𝑜 𝑖𝑛𝑣𝑒𝑠𝑡𝑚𝑒𝑛𝑡. ”
12
After our text cleaning process, the above abstracts become
𝑤1: “𝑏𝑎𝑛𝑘 𝑐𝑟𝑢𝑐𝑖𝑎𝑙 𝑒𝑛𝑡𝑟𝑒𝑝𝑟𝑒𝑛𝑒𝑢𝑟𝑠ℎ𝑖𝑝”
𝑤2: “𝑏𝑎𝑛𝑘 𝑐𝑟𝑢𝑐𝑖𝑎𝑙 𝑖𝑛𝑣𝑒𝑠𝑡”
For 𝑤1, 𝑁1 = 3, and {𝑤1,1, 𝑤1,2, 𝑤1,3} = {"𝑏𝑎𝑛𝑘", "𝑐𝑟𝑢𝑐𝑖𝑎𝑙", "𝑒𝑛𝑡𝑟𝑒𝑝𝑟𝑒𝑛𝑒𝑢𝑟𝑠ℎ𝑖𝑝"}. For 𝑤2,
𝑁2 = 3, and {𝑤2,1, 𝑤2,2, 𝑤2,3} = {"𝑏𝑎𝑛𝑘", "𝑐𝑟𝑢𝑐𝑖𝑎𝑙", "𝑖𝑛𝑣𝑒𝑠𝑡"}. We now assign our
parameters’ numerical values. We assign 𝛽’s value matrix as follows:
The value 𝛽𝑘,𝑣, or 𝛽𝑡𝑜𝑝𝑖𝑐,𝑤𝑜𝑟𝑑, is the probability of the word conditional on the topic. Here we
set the number of topics to be 3, so 𝐾 is 3; we have a dictionary of 4 unique words, so 𝑉 is 4.
The topic names are not the direct output of LDA. When we implement the machine learning
strategy, the algorithm only returns the key word list for each topic. We assign topic names to
facilitate the readability. For example, under the condition that the topic is 1 (banking), the
word “bank” appears with probability 0.7, and “crucial” appears with probability 0.3. We could
also arrange the matrix into a different representation
where each topic is associated with a list of words and their probabilities. We then proceed to
assume 𝜃 as
β
bank crucial invest entrepreneurship
1 (banking) 0.7 0.3 0 0
2 (investment) 0 0 1 0
3 (entrepreneurship) 0 0 0 1
Top
ic
(1…
K)
Word (1…V )
bank 0.7 invest 1 entrepreneurship 1
crucial 0.3
Topic 2 (investment)Topic 1 (banking) Topic 3 (entrepreneurship)
13
The value 𝜃𝑑,𝑘, or 𝜃𝑎𝑏𝑠𝑡𝑟𝑎𝑐𝑡,𝑡𝑜𝑝𝑖𝑐, is the percentage distribution of the topic in the abstract. We
could see that abstract 1 “bank crucial entrepreneurship” consists of 0.6 of topic 1 (banking)
and 0.4 of topic (entrepreneurship), and abstract 2 “bank crucial invest” consists of 0.7 of topic
1 (banking) and 0.3 of topic 2 (investment).
With the assumption of parameters, we could proceed to calculate the probability of
this collection of documents. The probability of the first word “bank” appearing in abstract 1
is calculated as
= ∑ 𝑃𝑟(𝑧1,1|𝜃1) 𝑃𝑟(𝑤1,1|𝛽𝑧1,1)
𝑧1,1
= 𝑃𝑟(𝑡𝑜𝑝𝑖𝑐1|𝜃𝑎𝑏𝑠𝑡𝑟𝑎𝑐𝑡1)𝑃𝑟("𝑏𝑎𝑛𝑘"|𝛽𝑡𝑜𝑝𝑖𝑐1) + 𝑃𝑟(𝑡𝑜𝑝𝑖𝑐2|𝜃𝑎𝑏𝑠𝑡𝑟𝑎𝑐𝑡1)𝑃𝑟("𝑏𝑎𝑛𝑘"|𝛽𝑡𝑜𝑝𝑖𝑐2)
+ 𝑃𝑟(𝑡𝑜𝑝𝑖𝑐3|𝜃𝑎𝑏𝑠𝑡𝑟𝑎𝑐𝑡1)𝑃𝑟("𝑏𝑎𝑛𝑘"|𝛽𝑡𝑜𝑝𝑖𝑐3)
= 0.6 × 0.7 + 0 × 0 + 0.4 × 0
= 0.42
and we could multiply the probability of all 3 words in abstract 1 together to obtain the
probability of abstract 1, which is
= ∏ ∑ 𝑃𝑟(𝑧1,𝑛|𝜃1) 𝑃𝑟 (𝑤1,𝑛|𝛽𝑧1,𝑛
)
𝑧1,𝑛
𝑁1
𝑛=1
= (0.6 × 0.7) × (0.6 × 0.3) × (0.4 × 1)
= 0.03024
Likewise, we could also calculate the probability of abstract 2, which is
θ
1 (banking) 2 (investment) 3 (entrepreneurship)
1 0.6 0 0.4
2 0.7 0.3 0
Topic (1…K )
Ab
stra
ct
(1…M
)
14
= ∏ ∑ 𝑃𝑟(𝑧2,𝑛|𝜃2) 𝑃𝑟 (𝑤2,𝑛|𝛽𝑧2,𝑛
)
𝑧2,𝑛
𝑁2
𝑛=1
= (0.7 × 0.7) × (0.7 × 0.3) × (0.3 × 1)
= 0.03087
We can then multiply the probability of each abstract together and obtain the probability of
this collection of abstracts, which is calculated as
= ∏ ∏ ∑ 𝑃𝑟(𝑧𝑑,𝑛|𝜃𝑑) 𝑃𝑟(𝑤𝑑,𝑛|𝛽𝑧𝑑,𝑛)
𝑧𝑑,𝑛
𝑁𝑑
𝑛=1
2
𝑑=1
= 0.03024 × 0.03087
= 9.335088 × 10−4
By adjusting the values of 𝜃 and 𝛽, we would obtain different probability values. The goal of
topic modeling and LDA is to find an optimized set of 𝜃 and 𝛽 so that the computed probability
is maximized.
However, the optimization of the computation above is generally intractable, as noted
by Hansen, McMahon, and Prat (2014). Therefore, direct maximum likelihood estimation
based on this computation is not applicable. To facilitate the computation, LDA assumes that
each 𝜃𝑑 is a K-dimensional Dirichlet random variable 𝐷𝑖𝑟𝑖𝑐ℎ𝑙𝑒𝑡(𝛼) , and each 𝛽𝑘 is a V-
dimensional Dirichlet random variable 𝐷𝑖𝑟𝑖𝑐ℎ𝑙𝑒𝑡(𝜂). The resulting probability of a corpus D
generated through the process is
𝑃𝑟(𝐷|𝛼, 𝜂)
= ∏ ∫ ⋯ ∫ ∏ 𝑃𝑟(𝛽𝑘|𝜂) 𝑃𝑟(𝜃𝑑|𝛼) (∏ ∑ 𝑃𝑟(𝑧𝑑,𝑛|𝜃𝑑) 𝑃𝑟(𝑤𝑑,𝑛|𝛽𝑧𝑑,𝑛)
𝑧𝑑,𝑛
𝑁𝑑
𝑛=1
)
𝐾
𝑘=1
𝑑𝜃𝑑𝑑𝛽1 … 𝑑𝛽𝐾
𝑀
𝑑=1
(2)
Dirichlet distribution is a multivariate generalization of the beta distribution, with
probability density function as
15
𝑓(𝑥1, … , 𝑥𝐾; 𝛼1, … , 𝛼𝐾) = 1
B(𝛼) ∏ 𝑥𝑖1
𝛼𝑖−1
𝐾
𝑖=1
(3)
where 𝑥1, … , 𝑥𝐾 sum to 1, and 𝛼1, … , 𝛼𝐾 are the parameters of the distribution. Dirichlet
distribution is the conjugate prior distribution of the categorical distribution and is often used
as the prior distribution for the categorical distribution. When the prior distribution is a
Dirichlet distribution and the data points are categorical distributions, as in the case of LDA,
then the posterior distribution will also be a Dirichlet distribution.
With the conjugation property between Dirichlet distribution and categorical distribution,
this optimization of probability of a corpus becomes tractable and we are able to estimate the
latent variables by maximum likelihood methods. 𝛼 and 𝜂 are hyper-parameters of this model,
and they can be tuned for different model behaviors. For example, abstracts contain fewer
topics with lower 𝛼 and they contain more topics with higher 𝛼 . Following Griffiths and
Steyvers (2004) and Steyvers and Griffiths (2007), we choose 𝛼 = 50/𝐾, and 𝜂 = 0.025 in
our analysis.
Various properties of LDA are worth noting. LDA is a bag-of-words language model,
where each abstract is modeled as the occurrence frequency of each word inside the abstract.
This approach ignores word order and simplifies the computation complexity. Hansen,
McMahon, and Prat (2014) argue that the resulting information loss has little impact on our
goal of determining the topic coverage. In addition, LDA is an “unsupervised” machine
learning algorithm. This means that the algorithm requires no pre-assigned labels – it is enough
to simply feed the textual data into the algorithm. This unsupervised property significantly
reduces workload when processing big data.
4.2. Dynamic Topic Model
16
For each abstract published in a discrete time t, the parameters 𝛼 and 𝛽𝑘 are now
replaced by 𝛼𝑡 and 𝛽𝑡,𝑘, which are evolved with Gaussian noise from 𝛼𝑡−1 and 𝛽𝑡−1,𝑘,
respectively. A simple version of such models are
𝛽𝑡,𝑘 |𝛽𝑡−1,𝑘~𝒩(𝛽𝑡−1,𝑘, 𝜎2𝐼) (4)
and
𝛼𝑡 |𝛼𝑡−1~𝒩(𝛼𝑡−1, 𝛿2𝐼) (5)
In our experiment, we set t as the publishing year of an abstract. Therefore, in each year
we obtain a different 𝛽𝑡,𝑘, the probability of each word that appears in each topic. Then we can
observe the evolution of word usage of every topic.
Apart from the discrete time DTM described above, continuous time DTM is proposed
by Wang, Blei, and Heckerman (2012). Rather than being discrete, t can take on any point on
a continuous timeline. While continuous time DTM is useful for high-frequency textual data,
such as tweets from Twitter, it is hardly applicable in our project that mainly uses yearly data.
5. Results
We apply LDA and DTM to the abstracts in 17 finance academic journals. The dataset
contains 20,185 abstracts and 12,046 unique words and phrases. After the cleaning process as
in Section 3.1, we are left with 5,332 unique words and phrases. Summary statistics of the 17
journals are listed in Table 1, including each journal’s time horizon and number of articles with
abstracts. For example, JF in our sample starts from 1976, the year when Web of Science
started storing its data15. Fig. A.1 plots the number of active journals and articles every year.
15 JF was founded in 1946 and JFE had its first publication in 1974, but Web of Science starts storing
these two journals’ data only from 1976. Moreover, Web of Science stores the article abstracts of JF
from 1991 and the article abstracts of JFQA from 1992. Table 1 lists the journals and their summary
statistics, including the first years that Web of Science stores the abstracts. We supplement the missing
17
5.1. Appropriate Number of Topics
To determine the appropriate number of topics, we run LDA and maximize the log-
likelihood of the data from the models trained for different numbers of topics. We compute the
probability of a set of abstracts unseen to the estimated model at the end of the machine learning
process to avoid the caveat of overfitting. The optimal number of topics should be accompanied
by the highest computed probability. Fig. 1 reports the log-likelihood of the data from the
trained model of different numbers of topics. The number of topics with the highest likelihood
is approximately 40. In implementing this approach, we find that there are topics that represent
general sentences and do not indicate specific research interest. For example, a topic with
keywords “relat”, “posit”, “neg”, “associ” and “evid” may simply represent an often used
general sentence “we provide evidences on a positive/negative relation/association”. Therefore,
we finally choose 50 topics when implementing LDA and exclude 12 general sentence topics
from them. A full list of general sentence topics is presented in Appendix Table A.2.
5.2. Naming the Topics
Table 2 presents each topic’s top ten keywords generated by LDA, i.e. the ten words
with the highest appearing probability in each topic. We name each topic by reading the
keywords and the articles that belong to it. For example, if we observe that “bank”, “loan”,
“borrow”, “lend”, “commerce” and “deposit” appear in one topic, after reading the articles
belonging to this topic, we name it as “Commercial Banking”; if we observe that “ceo”,
“manag”, “board”, “compens”, “incent”, “director” appear in one topic, we name it as “CEO,
abstracts of JF between 1976 and 1990 and those of JFQA between 1984 and 1991 from JSTOR and
manual search.
18
Board, Director”. The abstracts are categorized into 38 research topics and 12 general sentence
topics.
As we explained in the previous example of Laeven and Levine (2009), each abstract
has quantitative distribution on different topics. We define that an abstract focuses on a topic
if it has over 10% distribution on it. An abstract with higher distribution of a certain topic tends
to have more keywords for that topic. An abstract may have two or more topic focuses. For
example, Laeven and Levine (2009) is 12.7% on “Systematic Risk and Risk Premium”, 11%
on “Shareholder Right, Ownership Structure”, 10.3% on “Commercial Banking” and 10.1%
on “Financial Regulation”. Therefore, Laeven and Levine (2009) focuses on the four topics
“Systematic Risk and Risk Premium”, “Shareholder Right, Ownership Structure”,
“Commercial Banking” and “Financial Regulation” by our definition.
“Option Pricing” is the topic with the most publications that focus on it, followed by
“Commercial Banking”, “CEO, Board, Director”, “Market Microstructure”, “Central Bank,
Monetary Policy”, and “Mergers and Acquisitions”.
Table 3 lists the most cited articles in each topic. The citation numbers are collected on
Feb 25th, 2016. The year of publication is in the parenthesis. The number of citation is behind
the comma. We present Web of Science citation behind the author-years. In Appendix Table
A.3, we present the most cited articles in each topic by Google Scholar citation.
5.3. Historical Trend of Topics
Fig. 2 presents the historical evolution of topics. The topics are identified by LDA. The
horizontal axis represents the year of publication. The vertical axis represents the average
percentage for a given topic across abstracts in a given year, and its value can be interpreted as
the topic’s popularity of research interest. It is computed as
𝑝𝑖𝑡 = ∑ 𝑝𝑖𝑡𝑘
𝑁𝑡
𝑘=1/𝑁𝑡 (6)
19
where 𝑝𝑖𝑡𝑘 is year-t-published abstract 𝑘’s percentage distribution on topic 𝑖. 𝑁𝑡 is the total
number of articles published in year 𝑡16. For example, the average percentage distribution on
“CEO, Board, Director” across all abstracts rose from about 1.5% in 1980 to about 2.5% in
2015. We choose 50 topics, including 12 general sentences topics, and the total percentage 100%
is distributed on the 50 topics, therefore a topic attracts more attention and can be seen as
“popular” if its percentage is higher than 100%/50 = 2%.
We observe that the research interest in “Financial System, Banking Crisis” often spiked
around or after the financial crises, such as the savings and loan crisis in the late 1980s and
early 1990s. It grew even faster after the 2008 financial crisis. The research interest of “CEO,
Board, Director” has been growing stably in the past 40 years. Other topics that attracted more
attention include “Behavioral Finance”, “Central Bank, Monetary Policy”, “Commercial
Banking”, “Corporate Cash Holding”, “Hedge Fund, Mutual Fund”, “International Capital
Markets”, “Social Network and Cultural Effect”, “Venture Capital, Entrepreneurship”, and
“Volatility”. The research interest in topics like “Bond Term Structure” and “Optimal Choice
Model” has been shrinking.
It is worth noting that high fluctuation of values exists in the 1970s and 1980s for most
of the topics. Fig. A.1 plots the number of active journals and articles every year. In the 1970s
and 1980s, there were fewer journals and articles, resulting in more volatile values. Fig. A.2
plots the yearly publication numbers in JF, JFE, and RFS. Zivney and Bertin (1992) explain
that the output has become constant since the 1980s, following rapid growth in the number of
journals and articles published in the 1960s and 1970s. Our results show a continuous growth
in the number of journals and articles after 1990s.
16 We also computed 𝑝𝑖𝑡′
= ∑ 𝑝𝑖𝑡𝑘𝑁𝑖𝑡
′
𝑘=1 /𝑁𝑡 where 𝑁𝑖𝑡′ is the total number of articles that focus on topic
𝑖 in year 𝑡 and obtain robust results.
20
5.3.1. Topics with Fastest Growth and Contraction
Fig. 3.1 plots three fastest growing and three fastest shrinking topics in 17 journals. The
topics are identified by LDA. The horizontal axis represents the year of publication. The
vertical axis represents the popularity of the given topic, calculated as the average percentage
of each article’s percentage distribution on that topic. “Financial System, Banking Crisis”,
“Hedge Fund, Mutual Fund” and “Social Network and Cultural Effect” grew fastest from 2006
to 2015. “Market Microstructure”, “IPO” and “Option Pricing” experienced the greatest
contraction during the same period.
Fig. 3.2 plots the topics with fastest popularity increase and decrease in JF, JFE, and
RFS. “Social Network and Cultural Effect”, “Default and CDS”, and “CEO, Board, Director”
grew fastest from 2006 to 2015. “IPO”, “News, Analyst Report, Earnings Announcement”, and
“Determinants of Stock Return” experienced the most contraction during the same period.
5.3.2. Working Papers
For many articles, there is a time lag between its first circulation and final publication.
Sometimes the lag can be several years. Therefore, the trend of the published articles that we
show in Fig. 3 may not reflect the most recent dynamics of the finance research. To address
this concern, we apply the LDA model trained from published articles on 130,547 working
paper abstracts that we obtained from SSRN Financial Economics Network. We do not use
working papers uploaded to IDEAS because IDEAS does not distinguish working papers in
finance from those in economics.
We present three fastest growing and three fastest shrinking topics among working
papers in Fig. 4, which is similar to Fig. 3. Fig. 4 reports the rise and fall of each topic’s
popularity from 2006 to 2015. The horizontal axis represents the year of publication. The
21
vertical axis represents the popularity of the given topic, calculated as the average percentage
of an article’s distribution on that topic.
Working papers on “Social Network and Cultural Effect”, “News, Analyst Report,
Earnings Announcement”, and “International Capital Markets” grew fastest from 2006 to 2015.
“Market Microstructure”, “Macro Finance”, and “Statistical Estimation Methodology”
experienced the greatest contraction during the same period.
5.3.3. JEL Classification Codes
In some journals such as JFE and RFS, JEL codes are reported when articles are
published. In other journals such as JF and JFQA, JEL codes are not reported in published
articles.
We compare our algorithm-computed topics of the articles and their self-reported JEL
codes in this section by listing the most reported JEL codes of each topic, shown in Table 4.
The explanation of each JEL code is in Table A.4. We list 5 most reported JEL codes in articles
belonging to each topic, among the 190 (=5*38) JEL codes, 161 of them are in G category
(Financial Economics).
In some algorithm-computed topics, JEL codes that are not in G category (Financial
Economics) are also among the most reported. For example, in the topic of “CEO, Board,
Director”, J33 (Compensation Packages, Payment Methods) in J category (Labor and
Demographic Economics) is also one of the most reported JEL code, reminding the group of
research in CEO compensation. In the topic of “International Asset Pricing and Foreign
Exchange”, F31 (Foreign Exchange) and F36 (Financial Aspects of Economic Integration) in
F3 category (International Finance) are two of the five most reported JEL codes. In the topic
of “Statistical Estimation Methodology”, two JEL codes in C category (Mathematical and
Quantitative Methods) are among the five most reported JEL codes.
22
“Central Bank, Monetary Policy” is the only topic that does not have any of its five most
reported JEL codes in G category. Instead, two are in E category (Macroeconomics and
Monetary Economics) and three are in F category (International Finance). The two JEL codes
in E category are E52 (Monetary Policy) and E58 (Central Banks and Their Policies). The three
JEL codes in F category are F31 (Foreign Exchange), F41 (Open Economy Macroeconomics),
and F32 (Current Account Adjustment, Short-Term Capital Movements).
We find our algorithm-computed topics of the articles and their self-reported JEL codes
comparable. Therefore, we are able to apply the unsupervised machine learning algorithm on
the articles without JEL codes to determine their topics.
5.3.4. Evolution of Research Interests within Topics
Table 5 reports results of the Dynamic Topic Model: the evolution of interest within
topics. We report the results every 5 years. When implementing DTM, we use 50 topics and
the same hyper-parameters as we used with LDA to produce comparable results with our LDA
results. In Table 5, The words under each period are ranked by its frequency; words in higher
positions are more frequently appearing words.
In Panel A, the topic of “CEO, Board, Director”, the use of “manager/management” and
“control” declined after 2000, while the research of “CEO” and “board” rose.
In Panel B, the topic of “Determinants of Stock Return”, the January effect was once a
top theme before 1995. Since 2000, the January effect has not been on the list of the most
frequent words. Instead, “momentum” and “cross-section” rank higher over years.
In Panel C, the topic of “Commercial Banking”, we observe the rise of research interest
in lending and network, accompanied with a decline of deposit.
5.3.5. Trend of Cross-topic Research
23
In this section, we study whether there was more cross-topic research over years. To put
it another way, we examine whether research articles becomes broader or narrower in terms of
research topic coverage. To measure how broad an article is, we calculate the Herfindahl Index
of each abstract:
𝐻 = ∑ 𝑠𝑖2
38
𝑖=1
(7)
where 𝑠𝑖 represents the percentage distribution of the abstract on topic 𝑖.
Fig. 5 presents the trend of published articles’ research interest concentration. The solid
line represents the average Herfindahl Index of abstracts in 17 journals, the dashed line
represents the average Herfindahl Index of abstracts in JF, JFE, and RFS. The average
Herfindahl Index dropped sharply from 1976 to 1982, perhaps because many topics’ pioneering
works started to emerge during the early period and therefore cross-topic research were more
common. The two lines went up between 1982 and 2000, indicating that on average research
becomes narrower. One possible explanation is that many topics matured and the literature was
established after two decades’ development, and researchers made more incremental
contribution. The average Herfindahl Index of abstracts in 17 journals continued to increase
after 2000 while that in the three top journals tended to remain at a constant level and even
declined after 2010, indicating that the three top journals still publish more broad and cross-
topic articles.
5.4. Citation Network Between Topics
To understand how topics relate to each other and the “distance” between the topics, we
use the cross-reference data of each article to construct a citation network between topics. In
Fig. 6, there are 38 nodes and each of them represents a topic. A node’s size is proportional to
the number of articles that focus on the topic that the node represents. As defined in Section
24
5.2, an abstract focuses on a topic if it has over 10% distribution on it. Topics with more articles
have larger nodes.
The nodes are connected through edges. An edge represents the cross-reference between
the two topics. An edge is thicker if there is more cross-reference. For example, if topic A has
𝑁 articles, in total the 𝑁 articles cite articles in topic B for ∑ 𝑅𝑖𝐵𝑁
𝑖=1 times, where 𝑅𝑖𝐵 is the
number of times that article 𝑖 cites articles in topic B. Similarly, if topic B has 𝑀 articles, in
total the 𝑀 articles cite articles in topic A for ∑ 𝑅𝑗𝐴𝑀
𝑗=1 times, where 𝑅𝑗𝐴 is the number of times
that article 𝑗 cites articles in topic A. Then the total number of cross-reference is ∑ 𝑅𝑖𝐵𝑁
𝑖=1 +
∑ 𝑅𝑗𝐴𝑀
𝑗=1 and is proportional to the thickness of the edge between A and B.
Each node is positioned by a force-directed gravity algorithm called “Force Atlas 2” and
the node is in a position when the forces from each edge’s direction are balanced (Jacomy,
Venturini, Heymann and Bastian, 2014). Intuitively speaking, the algorithm assumes a force to
push every node outward from the center; the algorithm also allows every node to exert gravity
on its connected nodes and drive them inward. Each node is connected with other nodes via
edges. Thicker edge represents greater gravity. If a topic (node A) has a small cross-reference
(thin edge) with another topic (node B) and a large cross-reference (thick edge) with the third
topic (node C), then node A will exert larger gravity on node C. Therefore, the topics with more
cross-reference will be “attracted” closer by their connected edges. The network structure
dynamically evolves and eventually reach an equilibrium where the topics with more cross-
reference cluster. Therefore, the relative position of the nodes is determined by the algorithm,
not chosen by ourselves. The distance between two nodes approximately represents how close
the two topics are related in terms of cross-reference.
We conduct modularity analysis to categorize the topics into clusters based on the
computation of the distance and attraction between the nodes. The number of clusters is
determined by the modularity analysis algorithm, and 38 topics are compartmentalized into 5
25
clusters, or “territories”: asset pricing, corporate finance, market microstructure, banking and
macro finance, and “mixed areas”. Each node’s color reflects the territory it belongs to.
The left side17 of Fig. 6 is clustered with corporate finance topics, including large topics
such as “CEO, Board, Director”, “Mergers and Acquisitions”, “Shareholder Right, Ownership
Structure”, and “IPO”. The bottom side is clustered with banking and macro finance topics,
including large topics such as “Commercial Banking”, “Central Bank, Monetary Policy”,
“Financial System, Banking Crisis”, and “Financial Regulation”. The right side is clustered
with asset pricing topics, including large topics such as “Option Pricing”, “Volatility”, “Return
Distribution and Value-at-Risk (VaR)”, and “Bond Term Structure”. The central side is
clustered with market microstructure topics, including large topics such as “Market
Microstructure”, “Trader Behavior”, and “Information Asymmetry, Disclosure, Insider
Trading”. The upper side is clustered with “mixed areas”, including large topics such as “Hedge
Fund, Mutual Fund”, “News, Analyst Report, Earnings Announcement”, “Behavioral Finance”,
and “Statistical Estimation Methodology”.
5.5. Bibliometric Regularity
Fig. 7 presents a bibliometric regularity: the number of researchers covering n topics is
approximately 1/2𝑛 of those covering just one topic. A researcher covers a topic if she
publishes at least one article with over 10% distribution on that topic. The horizontal axis of
Fig. 7 represents the number of topics, and the vertical axis represents the number of
researchers.
17 The “Force Atlas 2” force-directed gravity algorithm only determines the relative position of nodes.
The network can be rotated clockwise or counter-clockwise. Here the left, right, upper and bottom
sides are only for explanatory purpose on Fig. 6.
26
The solid line is generated from our data, which is downward sloping because fewer
researchers are able to cover more topics. The value of each point on the line indicates how
many researchers cover exactly how many topics. For example, the first point on the solid line
is (1, 6830), meaning that 6830 researchers publish articles that focus on just one topic. The
second point is (2, 3507), meaning that 3507 researchers publish articles that focus on just two
topics. We use the dashed line 𝑦 = 13215/2𝑛 to fit the solid line, where y is the number of
researchers covering n topics. When 𝑛 = 1, 𝑦 = 6625.5; when 𝑛 = 2, 𝑦 = 3312.75. The R-
squared value of the fitting is 0.998.
6. Conclusion
How did the finance research topics evolve in the past forty years? In this article, we
apply latent Dirichlet allocation (LDA) model on 20,185 abstracts of finance articles published
between 1976 and 2015, and identify 38 research topics. We present the fastest growing topics
of published articles and working papers in the past decade. For example, publications on
“Financial System, Banking Crisis” and “Hedge Fund, Mutual Fund” grew the fastest from
2006 to 2015, while working papers on “Social Network, Cultural Effect” and “News, Analyst
Report, Earnings Announcement” grew the fastest during the same period. We use citation
network to present how topics are related, and cluster the topics in five “territories”: asset
pricing, corporate finance, market microstructure, banking and macro finance, and “mixed
areas” including “Social Network, Cultural Effect”, “Venture Capital, Entrepreneurship” and
etc. We find our algorithm-computed topics of the articles and their self-reported JEL codes
comparable, which implies that our algorithm can be used to categorize the articles without
JEL codes. Moreover, we find a strong bibliometric regularity: the number of researchers
covering n topics is approximately 1/2𝑛 of those covering just one topic. We also find that on
average a finance publication has been covering fewer topics and therefore becomes narrower
27
over years. To the best of our knowledge, it is among the first machine learning research of
finance academic publication. Overall, we hope that our study may be beneficial to those who
desire to have an overview of this academic profession and inspire more cross-topic research.
28
References
Alexander, J. C., & Mabry, R. H. (1994). Relative Significance of Journals, Authors, and
Articles Cited in Financial Research. The Journal of Finance, 49(2), 697-712.
Azoulay, P., Wang, J., & Zivin, J. G. (2010). Superstar Extinction. Quarterly Journal of
Economics, 125(2).
Blei, D. M., & Lafferty, J. D. (2006). Dynamic Topic Models. In Proceedings of the 23rd
International Conference on Machine Learning (pp. 113-120). ACM.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine
Learning Research, 3(Jan), 993-1022.
Borokhovich, K. A., Bricker, R. J., Brunarski, K. R., & Simkins, B. J. (1995). Finance Research
Productivity and Influence. The Journal of Finance, 50(5), 1691-1717.
Borokhovich, K. A., Bricker, R. J., & Simkins, B. J. (1994). Journal Communication and
Influence in Financial Research. The Journal of Finance,49(2), 713-725.
Brogaard, J., Engelberg, J., & Parsons, C. A. (2014). Networks and Productivity: Causal
Evidence from Editor Rotations. Journal of Financial Economics, 111(1), 251-270.
Chung, K. H. & Cox, R. A. (1990). Patterns of Productivity in the Finance Literature: A Study
of the Bibliometric Distributions. The Journal of Finance, 301--309.
Cleary, F. R., & Edwards, D. J. (1960). The Origins of the Contributors to the AER During the
‘Fifties. The American Economic Review, 50(5), 1011-1014.
Corrado, C. J., & Ferris, S. P. (1997). Journal Influence on the Design of Finance Doctoral
Education. The Journal of Finance, 52(5), 2091-2102.
Froman, L. A. (1952). Graduate Students in Economics. The American Economic
Review, 42(4), 602-608.
Griffiths, T. L., & Steyvers, M. (2004). Finding Scientific Topics. Proceedings of the National
Academy of Sciences, 101(suppl 1), 5228-5235.
Hansen, S., McMahon, M., & Prat, A. (2014). Transparency and Deliberation within the FOMC:
A Computational Linguistics Approach. Working Paper
Heck, J. L., Cooley, P. L., & Hubbard, C. M. (1986). Contributing Authors and Institutions to
the Journal of Finance: 1946-1985. The Journal of Finance,41(5), 1129-1140.
Henry, W. R., & Burch, E. E. (1974). Institutional Contributions to Scholarly Journals of
Business. The Journal of Business, 47(1), 56-66.
Hoberg, G., & Phillips, G. (2010). Product Market Synergies and Competition in Mergers and
Acquisitions: A Text-based Analysis. The Review of Financial Studies, 23(10), 3773-3811.
Jacomy, M., Venturini, T., Heymann, S., & Bastian, M. (2014). ForceAtlas2, a continuous
graph layout algorithm for handy network visualization designed for the Gephi software.
PloS One, 9(6), e98679.
29
Klemkosky, R. C., & Tuttle, D. L. (1977). The Institutional Source and Concentration of
Financial Research. The Journal of Finance, 32(3), 901-907.
Laband, D. N., & Piette, M. J. (1994). Favoritism versus Search for Good Papers: Empirical
Evidence Regarding the Behavior of Journal Editors. Journal of Political
Economy, 102(1), 194-203.
Laeven, L., & Levine, R. (2009). Bank Governance, Regulation and Risk Taking. Journal of
Financial Economics, 93(2), 259-275.
Loughran, T., & McDonald, B. (2011). When is a Liability not a Liability? Textual Analysis,
Dictionaries, and 10‐Ks. The Journal of Finance, 66(1), 35-65.
Loughran, T., & McDonald, B. (2013). IPO First-day Returns, Offer Price Revisions, Volatility,
and Form S-1 Language. Journal of Financial Economics, 109(2), 307-326.
Loughran, T., & McDonald, B. (2014). Measuring Readability in Financial Disclosures. The
Journal of Finance, 69(4), 1643-1671.
Niemi, A. W. (1987). Institutional Contributions to the Leading Finance Journals, 1975
Through 1986: A Note. The Journal of Finance, 42(5), 1389-1397.
Schwert, G. W. (1993). The Journal of Financial Economics: A Retrospective Evaluation
(1974–1991). Journal of Financial Economics, 33(3), 369-424.
Steyvers, M., & Griffiths, T. (2007). Probabilistic Topic Models in Latent Semantic Analysis:
A Road to Meaning, Landauer, T. and Mc Namara, D. and Dennis, S. and Kintsch, W.,
eds.
Swidler, S., & Goldreyer, E. (1998). The Value of a Finance Journal Publication. The Journal
of Finance, 53(1), 351-363.
Tetlock, P. C. (2007). Giving Content to Investor Sentiment: The Role of Media in the Stock
Market. The Journal of Finance, 62(3), 1139-1168.
Tetlock, P. C., Saar‐Tsechansky, M., & Macskassy, S. (2008). More than Words: Quantifying
Language to Measure Firms’ Fundamentals. The Journal of Finance, 63(3), 1437-1467.
Wang, C., Blei, D., & Heckerman, D. (2012). Continuous Time Dynamic Topic Models. arXiv
preprint arXiv:1206.3298.
Welch, I. (2014). Referee Recommendations. Review of Financial Studies, 27(9), 2773-2804.
Zivney, T. L., & Bertin, W. J. (1992). Publish or Perish: What the Competition is Really
Doing. The Journal of Finance, 47(1), 295-329.
30
Table 1: Summary Statistics of Sample Journals
This table reports summary statistics of 20,185 articles published on 17 finance journals
between 1976 and 2015. We obtain each article’s title, authors, affiliations, abstract, full text,
references, citations and publishing date from Web of Science, supplemented with
ScienceDirect, JSTOR and manual search. We exclude articles without abstracts in our sample.
For example, The Journal of Finance (JF) and Journal of Financial Economics (JFE) in our
sample starts from 1976, the year when it had abstracts in our sample for the first time. We
report the first and last year that each journal started to have abstracts in our sample. We have
Review of Financial Studies’ data from 1988, the year of its first volume. JF was founded in
1946 and JFE had its first publication in 1974, but Web of Science started storing these two
journals’ data only from 1976. Moreover, Web of Science stores the article abstracts of JF from
1991 and the article abstracts of Journal of Financial and Quantitative Analysis (JFQA) from
1992. We supplement the missing abstracts of JF between 1976 and 1990 and those of JFQA
between 1984 and 1991 from JSTOR and manual search. We also report the total and median
number of articles published on each journal in our sample.
Journal
First Year
of
Abstract
Last Year
of
Abstract
Total
Number
Annual
Median%
Journal of Banking and Finance 1977 2015 4104 75 20.3%
The Journal of Finance 1976 2015 2465 69 12.2%
Journal of Financial Economics 1976 2015 2304 47 11.4%
Journal of International Money and Finance 1982 2015 1627 49 8.1%
Review of Financial Studies 1988 2015 1505 37 7.5%
Journal of Money Credit and Banking 1997 2015 1246 77 6.2%
Journal of Financial and Quantitative Analysis 1984 2015 1168 35 5.8%
Quantitative Finance 2001 2015 998 62 4.9%
Journal of Portfolio Management 1992 2015 908 39 4.5%
Journal of Futures Markets 1981 2015 870 50 4.3%
Journal of Corporate Finance 1994 2015 833 46 4.1%
Journal of Business Finance and Accounting 1976 2015 558 47 2.8%
Journal of Empirical Finance 1993 2015 476 60 2.4%
Journal of Financial Intermediation 1990 2015 416 18 2.1%
Journal of Financial Markets 1998 2015 308 19 1.5%
Review of Finance 1997 2015 289 27 1.4%
Journal of Financial Research 1978 2015 110 30 0.5%
Total 1976 2015 20185 358 100.0%
31
Table 2: Keywords for Each Topic
This table reports each topic’s top ten keywords with the highest appearing probabilities. The 38 topics are identified by latent Dirichlet allocation
(LDA) model. The methodology of LDA is detailed in Section 4.1. We name each topic by reading the keywords and the articles that belong to it.
For example, if we observe that “bank”, “loan”, “borrow”, “lend”, “commerce” and “deposit” appear in one topic, after reading the articles
belonging to this topic, we name it as “Commercial Banking”; if we observe that “ceo”, “manag”, “board”, “compens”, “incent”, “director” appear
in one topic, we name it as “CEO, Board, Director”. The abstracts are categorized into 38 research topics and 12 general sentence topics. Each
abstract has quantitative distribution on different topics. We define that an abstract focuses on a topic if it has over 10% distribution on it. An
abstract with higher distribution of a certain topic tends to have more keywords for that topic. An abstract may have two or more topic focuses.
For example, Laeven and Levine (2009) is 12.7% on “Systematic Risk and Risk Premium”, 11% on “Shareholder Right, Ownership Structure”,
10.3% on “Commercial Banking” and 10.1% on “Financial Regulation”. Therefore, Laeven and Levine (2009) focuses on the four topics
“Systematic Risk and Risk Premium”, “Shareholder Right, Ownership Structure”, “Commercial Banking” and “Financial Regulation” by our
definition. We order the topics by the number of articles that focus on the topic. “Option Pricing” is the topic with the most publications that focus
on it, followed by “Commercial Banking”, “CEO, Board, Director”, “Market Microstructure”, “Central Bank, Monetary Policy”, and “Mergers
and Acquisitions”.
32
No. Topic No. of Papers 1 2 3 4 5 6 7 8 9 10
1 Option Pricing 890 option process jump stochast underli exercis diffus american european black schole
2 Commercial Banking 812 bank loan borrow lend commerci deposit credit busi securit branch
3 CEO, Board, Director 717 ceo manag board compens incent director perform independ monitor execut
4 Market Microstructure 677 trade order spread exchang stock quot bid ask dealer nyse limit order
5 Central Bank, Monetary Policy 650 exchang rate shock respons monetari polici economi central bank interest rate intervent reserv stabil
6 Mergers and Acquisitions 623 target acquisit merger acquir takeov bid deal auction announc sharehold
7 Return Distribution and Value-at-Risk (VaR) 599 distribut method estim var normal extrem skew tail paramet simul
8 News, Analyst Report, Earnings Announcement 572 earn announc news analyst event report reaction stock abnorm return surpris
9 Hedge Fund, Mutual Fund 560 fund manag perform activ mutual fund hedg fund strategi invest fee alpha
10 Shareholder Right, Ownership Structure 556 control ownership govern sharehold compani right protect structur vote corpor
11 International Capital Markets 521 countri intern foreign develop domest unit state global integr region emerg market
12 IPO 511 issu ipo offer equiti underwrit initi public share underpr season
13 Capital Structure, Bankruptcy, Leverage 487 debt equiti leverag bankruptci capit structur corpor convert claim distress creditor
14 Macro Finance 484 inflat real output suppli incom labor busi cycl consum growth macroeconom
15 Volatility 460 volatil condit correl dynam varianc regim process depend garch switch
16 Default and CDS 448 rate credit default spread probabl swap mortgag agenc structur collater
17 Commodities, Futures 436 futur index hedg contract forward commod spot deriv oil underli
18 Trader Behavior 402 trade liquid volum day trader open pattern close intraday specul
19 Bond Term Structure 401 bond term interest rate yield matur short term term structur call rate treasuri
20 Determinants of Stock Return 384 return stock excess predict momentum januari revers anomali cross section season
21 Asset and Portfolio Allocation 380 asset portfolio return diversif varianc alloc mean correl riski covari
22 Asset Pricing Model 380 expect equilibrium gener uncertainti agent prefer consumpt ration risk avers belief
23 Financial Regulation 380 capit requir regul insur liabil limit act deposit insur failur polici
24 Statistical Estimation Methodology 375 estim forecast error predict regress statist bias paramet variabl coeffici
25 International Asset Pricing and Foreign Exchange 350 unit state currenc dollar european euro uk area spillov exchang rate japanes
26 Venture Capital, Entrepreneurship 336 invest financ capit decis extern constraint project opportun ventur entrepreneur
27 Industry Competition and Market Efficiency 328 effici industri product profit competit innov technolog improv cost structur
28 Tax 316 tax short sell loss interest sale arbitrag margin restrict incom
29 Financial System, Banking Crisis 314 financi crisi system import contagion stabil intermediari global stress failur
30 Multifactor Model 291 factor variabl explain variat compon cross section common specif power signific
31 Dividend Policy 269 growth dividend ratio share repurchas polici payout determin pay cash flow
32 Information Asymmetry, Disclosure, Insider Trading 265 privat public insid signal disclosur inform asymmetri improv transpar reveal avail
33 Optimal Choice Model 252 optim strategi maxim choic dynam program design minim condit transact cost
34 Corporate Operational Struture and Value Creation 248 firm corpor cash flow affect oper busi examin characterist level control
35 Systematic Risk and Risk Premium 240 risk premium exposur beta systemat expect idiosyncrat equiti sensit adjust
36 Behavioral Finance 208 investor behavior individu ex ant sentiment dispers tend retail herd
37 Corporate Cash Holding 134 cost higher lower hold cash greater level increas reduc payment
38 Social Network and Cultural Effect 112 institut particip group analysi social network influenc individu central affect
33
Table 3: Most Cited Articles in Each Topic (Web of Science)
This table lists the most cited articles in each topic. The 38 topics are identified by latent Dirichlet allocation (LDA) model. The methodology of
LDA is detailed in Section 4.1. The citation numbers are collected on Feb 25th, 2016. The year of publication is in the parenthesis. The author
name or the names of coauthors are before the parenthesis. We present Web of Science citation behind the parenthesis.
No. Topic 1 2 3 4 5
1 Option Pricing Heston (1993), 1684 Cox, Ross, Rubinstein (1979), 1398 Vasicek (1977), 1387 Merton (1976), 1259 Cox, Ross (1976), 842
2 Commercial Banking Sharpe (1990), 434 Barth, Caprio, Levine (2004), 340 Boot (2000), 334 Petersen, Rajan (2002), 317 Berger, Miller, Petersen, Rajan, Stein (2005), 268
3 CEO, Board, Director Yermack (1996), 975 Weisbach (1988), 850 Core, Holthausen, Larcker (1999), 649 Amit, Villalonga (2006), 601 Agrawal, Knoeber (1996), 404
4 Market Microstructure Lee, Ready (1991), 704 Copeland, Galai (1983), 392 Hamao, Masulis, Ng (1990), 366 Glosten, Harris (1988), 355 Huang, Stoll (1996), 275
5 Central Bank, Monetary Policy Eun, Shim (1989), 236 Meese, Rogoff (1988), 217 Sercu, Uppal, van Hulle (1995), 153 Blanchard, Galí (2007), 149 Thorbecke (1997), 136
6 Mergers and Acquisitions Jensen, Ruback (1983), 1035 Morck, Shleifer, Vishny (1990), 497 Bradley, Desai, Kim (1988), 385 Moeller, Schlingemann, Stulz (2004), 335 Shleifer, Vishny (2003), 329
7 Return Distribution and Value-at-Risk (VaR) Rockafellar, Uryasev (2002), 729 Cont (2001), 506 Longin, Solnik (2001), 481 Rubinstein (1994), 407 Jackwerth, Rubinstein (1996), 245
8 News, Analyst Report, Earnings Announcement Barberis, Shleifer, Vishny (1998), 727 Fama, French (1995), 568 Ikenberry, Lakonishok, Vermaelen (1995), 362 Teoh, Welch, Wong (1998), 361 Womack (1996), 332
9 Hedge Fund, Mutual Fund Carhart (1997), 1910 Sirri, Tufano (1998), 471 Daniel, Grinblatt, Titman, Wermers (1997), 440 Wermers (1999), 285 Wermers (2000), 278
10 Shareholder Right, Ownership Structure Shleifer, Vishny (1997), 2156 La Porta, Lopez-de-Silanes, Shleifer (1999), 2027 La Porta, Lopez-de-Silanes, Shleifer, Vishny (1997), 1927 Claessens, Djankov, Lang (2000), 1004 La Porta, Lopez-de-Silanes, Shleifer, Vishny (2000), 900
11 International Capital Markets Bekaert, Harvey (1995), 465 Coval, Moskowitz (1999), 384 Bekaert, Harvey (2000), 354 Claessens, Demirgüç-Kunt, Huizinga (2001), 267 Harvey (1995), 252
12 IPO Loughran, Ritter (1995), 671 Ritter (1991), 614 Carter, Manaster (1990), 590 Rock (1986), 544 Megginson, Weiss (1991), 521
13 Capital Structure, Bankruptcy, Leverage Smith, Warner (1979), 747 Rajan (1992), 735 Titman, Wessels (1988), 702 Leland (1994), 467 Deangelo, Masulis (1980), 411
14 Macro Finance Schwert (1989), 680 Estrella, Hardouvelis (1991), 339 Constantinides, Ferson (1991), 182 Blanchard, Galí (2007), 149 McCallum, Nelson (1999), 143
15 Volatility Glosten, Jagannathan, Runkle (1993), 1407 Engle, Ng (1993), 798 Andersen (2001), 484 Pan (2002), 394 Campbell, Hentschel (1992), 392
16 Default and CDS Jarrow, Lando, Turnbull (1997), 292 Longstaff, Mithal, Neis (2005), 281 Blanco, Brennan, Marsh (2005), 185 Bharath, Shumway (2008), 181 Crouhy, Galai, Mark (2000), 174
17 Commodities, Futures Black (1976), 712 Schwartz (1997), 464 Gibson, Schwartz (1990), 259 Fama (1984), 221 Stoll, Whaley (1990), 207
18 Trader Behavior Admati, Pfleiderer (1988), 704 Brunnermeier, Pedersen (2009), 495 French, Roll (1986), 481 Easley, O'Hara (1987), 475 de Long, Shleifer, Summers, Waldmann (1990), 405
19 Bond Term Structure Vasicek (1977), 1387 Fama, French (1989), 779 Chan, Karolyi, Longstaff, Sanders (1992), 489 Longstaff, Schwartz (1995), 470 Leland (1994), 467
20 Determinants of Stock Return Jegadeesh, Titman (1993), 1400 Fama, French (1996), 1065 Debondt, Thaler (1985), 1032 Amihud (2002), 875 French, Schwert, Stambaugh (1987), 823
21 Asset and Portfolio Allocation Demiguel, Garlappi, Uppal (2009), 252 Jagannathan, Ma (2003), 205 Best, Grauer (1991), 170 Chopra, Ziemba (1993), 167 Kim, Omberg (1996), 160
22 Asset Pricing Model Breeden (1979), 713 Stulz (1981), 209 Diamond, Verrecchia (1981), 196 Breeden, Gibbons, Litzenberger (1989), 171 Sundaresan (1989), 147
23 Financial Regulation Barth, Caprio, Levine (2004), 340 Karpoff, Lee, Martin (2008), 137 Marcus (1984), 127 Buser, Chen, Kane (1981), 116 Dahl, Shrieves (1992), 106
24 Statistical Estimation Methodology Petersen (2009), 1413 Barber, Lyon (1997), 526 Dimson (1979), 492 Stambaugh (1999), 350 Hodrick (1992), 306
25 International Asset Pricing and Foreign Exchange Hamao, Masulis, Ng (1990), 366 Dittmar, Neely, Weller (1997), 189 Peel, Taylor (2000), 185 Lins, Servaes (1999), 137 Cheung, Chinn (2001), 131
26 Venture Capital, Entrepreneurship Sahlman (1990), 586 Hellmann, Puri (2002), 354 Hellmann, Puri (2000), 249 Hsu (2004), 230 Gompers (1995), 198
27 Industry Competition and Market Efficiency Claessens, Laeven (2004), 218 Klapper, Laeven, Rajan (2006), 211 Berger, Deyoung (1997), 204 Bonin, Hasan, Wachtel (2005), 188 Gold, Sherman (1985), 183
28 Tax Shefrin, Statman (1985), 478 Lakonishok, Shleifer, Vishny (1992), 317 Claessens, Demirgüç-Kunt, Huizinga (2001), 267 Grinblatt, Keloharju (2001), 209 Miller, Scholes (1978), 205
29 Financial System, Banking Crisis Rajan, Zingales (2003), 601 Beck, Levine, Loayza (2000), 530 Allen, Qian, Qian (2005), 442 Hoshi, Kashyap, Scharfstein (1990), 297 Faccio, Masulis, McConnell (2006), 243
30 Multifactor Model Fama, French (1993), 3481 Fama, French (1992), 2381 Jagannathan, Wang (1996), 444 Harvey, Siddique (2000), 373 Daniel, Titman (1997), 353
31 Dividend Policy La Porta, Lopez-de-Silanes, Shleifer, Vishny (2000), 431 Fama, French (2001), 425 Fama, French (2002), 393 Brav, Graham, Harvey, Michaely (2005), 272 Grullon, Michaely (2002), 232
32 Information Asymmetry, Disclosure, Insider Trading Diamond, Verrecchia (1991), 425 Easley, O'Hara (2004), 395 Seyhun (1986), 247 Froot, Scharfstein, Stein (1992), 220 Blume, Easley, O'Hara (1994), 211
33 Optimal Choice Model Grossman, Hart (1988), 304 Admati, Pfleiderer (1994), 179 Harris, Raviv (1988), 170 Jorion (1986), 168 Kroll, Levy, Markowitz (1984), 143
34 Corporate Operational Struture and Value Creation Morck, Shleifer, Vishny (1988), 1452 Claessens, Djankov, Lang (2000), 1004 Almeida, Campello, Weisbach (2004), 319 Campa, Kedia (2002), 305 Coles, Daniel, Naveen (2008), 302
35 Systematic Risk and Risk Premium Harvey, Siddique (2000), 373 Acerbi, Tasche (2002), 293 Harvey (1991), 265 Laeven, Levine (2009), 263 Ferson, Harvey (1993), 240
36 Behavioral Finance Odean (1998), 586 Shefrin, Statman (1985), 478 Barber, Odean (2008), 303 Grinblatt (2000), 294 Lee, Shleifer, Thaler (1991), 182
37 Corporate Cash Holding Opler (1999), 394 Harford (1999), 232 Bates, Kahle, Stulz (2009), 190 Harford, Mansi, Maxwell (2008), 163 Dittmar, Mahrt-Smith, Servaes (2003), 145
38 Social Network and Cultural Effect Hong, Kubik, Stein (2004), 216 Boss, Elsinger, Summer, Thurner (2004), 116 Hong, Kacperczyk (2009), 102 Blinder, Morgan (2005), 81 Brown, Ivković, Smith, Weisbenner (2008), 74
34
Table 4: Most reported JEL Codes in Each Topic
This table presents the five most reported JEL codes in the articles belonging to each topic.
The 38 topics are identified by latent Dirichlet allocation (LDA) model. The methodology of
LDA is detailed in Section 4.1. For each topic, we first find all articles that have at least 10%
distribution on it, put together the JEL codes reported on those articles, and count the number
of each JEL code. Then we present the five most reported JEL codes for each topic. A detailed
explanation of each JEL code is listed in Table A.4.
1 2 3 4 5
Option Pricing G13 G12 G11 G14 C63
Commercial Banking G21 G28 G32 G34 G24
CEO, Board, Director G34 G32 G30 J33 G38
Market Microstructure G14 G15 G12 G10 G18
Central Bank, Monetary Policy F31 F41 E52 E58 F32
Mergers and Acquisitions G34 G32 G14 G21 G30
Return Distribution and Value-at-Risk (VaR) G12 G11 G13 C14 G21
News, Analyst Report, Earnings Announcement G14 G24 G12 M41 G11
Hedge Fund, Mutual Fund G11 G23 G12 G14 G20
Shareholder Right, Ownership Structure G32 G34 G38 G30 G21
International Capital Markets G15 F36 G11 F21 G21
IPO G24 G32 G14 G30 G34
Capital Structure, Bankruptcy, Leverage G32 G33 G34 G13 G31
Macro Finance F41 E31 G12 E52 G11
Volatility G12 C32 C22 G13 G10
Default and CDS G21 G12 G13 G33 G28
Commodities, Futures G13 G15 G11 G14 G12
Trader Behavior G14 G12 G15 G10 D82
Bond Term Structure G12 E43 G13 G32 G11
Determinants of Stock Return G12 G14 G11 G10 G15
Asset and Portfolio Allocation G11 G12 G23 G15 D81
Asset Pricing Model G12 G11 G14 G13 G10
Financial Regulation G21 G28 G22 G32 G11
Statistical Estimation Methodology G12 C22 G14 C53 G11
International Asset Pricing and Foreign Exchange F31 G15 F36 G12 G14
Venture Capital, Entrepreneurship G32 G31 G24 G34 G30
Industry Competition and Market Efficiency G21 G28 G32 G34 D24
Tax G14 G12 G32 G11 G34
Financial System, Banking Crisis G21 G01 G28 G15 F3
Multifactor Model G12 G11 G14 G15 G10
Dividend Policy G35 G32 G34 G12 G14
Information Asymmetry, Disclosure, Insider Trading G14 G32 D82 G21 G24
Optimal Choice Model G11 C61 D81 G32 G12
Corporate Operational Struture and Value Creation G32 G34 G30 G31 G38
Systematic Risk and Risk Premium G12 G11 G21 G32 G13
Behavioral Finance G11 G14 G12 G15 G10
Corporate Cash Holding G32 G31 G34 G21 D12
Social Network and Cultural Effect G11 G32 G14 G12 G34
Most Reported JEL CodesTopic
35
Table 5: The Evolution of Interest within Topics
This table reports the evolution of interest within topics by listing the high-frequency words in different years from Dynamic Topic Model (DTM)
analysis. The methodology of DTM is detailed in Section 4.2. We report the results every 5 years. When implementing DTM, we use 50 topics
and the same hyper-parameters as we used with LDA to produce comparable results with our LDA results. The words under each period are ranked
by its frequency; words in higher positions are more frequently appearing words. In each column, words in higher positions are more frequently
appearing words. We highlight the words that we explain in the text. In Panel A, the topic of “CEO, Board, Director”, the use of
“manager/management” and “control” declined after 2000, while the research of “CEO” and “board” rose. In Panel B, the topic of “Determinants
of Stock Return”, the January effect was once a top theme before 1995. Since 2000, the January effect has not been on the list of the most frequent
words. Instead, “momentum” and “cross-section” rank higher over years. In Panel C, the topic of “Commercial Banking”, we observe the rise of
research interest in lending and network, accompanied with a decline of deposit.
Panel A: “CEO, Board, Director”
1976 1980 1985 1990 1995 2000 2005 2010 2015
corpor corpor manag manag manag control compani sharehold ceo
manag manag corpor sharehold sharehold ownership control ceo sharehold
control control sharehold corpor control manag sharehold corpor board
sharehold sharehold control control ownership compani corpor compani compens
compani compani compani compani corpor sharehold board board incent
ownership ownership ownership ownership compani corpor incent incent corpor
compens compens compens compens compens incent compens compens compani
incent incent incent incent incent compens ownership control director
plan vote vote manageri manageri board ceo govern execut
vote plan manageri plan outsid manageri manag director famili
CEO, Board, Director
36
Panel B: “Determinants of Stock Return”
1976 1980 1985 1990 1995 2000 2005 2010 2015
return return return return return return return return return
stock stock stock stock stock stock stock stock stock
month month month revers revers revers revers momentum momentum
season januari januari month month past momentum cross-sect cross-sect
januari season season season past low past revers revers
revers revers revers januari low momentum low low low
inconsist inconsist past past cross-sect month cross-sect past past
past past averag cross-sect season cross-sect month month predict
averag averag inconsist low januari explain explain explain month
anomali anomali cross-sect averag explain averag book-to-market averag explain
Determinants of Stock Return
Panel C: “Commercial Banking”
1976 1980 1985 1990 1995 2000 2005 2010 2015
bank bank bank bank bank bank bank bank bank
deposit deposit deposit deposit system system system system system
system system system system regul competit competit competit regul
requir requir requir requir deposit regul regul regul competit
competit competit competit regul requir requir requir requir lend
regul regul regul competit competit deposit deposit lend requir
oper oper oper oper insolv oper lend deposit network
balanc balanc failur failur failur failur oper oper deposit
failur failur branch insolv oper lend failur network oper
branch branch balanc branch entri entri network faliur interbank
Commercial Banking
37
Fig. 1: Log Likelihood versus Number of Topics
This figure reports the log-likelihood of latent Dirichlet allocation (LDA) model under different
number of topics. Higher likelihood reflects that the LDA model models the corpus better. The
maximum likelihood occurs around 40 topics including the topics of general sentences. In
implementing this approach, we find that there are topics that represent general sentences and
do not indicate specific research interest. For example, a topic with keywords “relat”, “posit”,
“neg”, “associ” and “evid” may simply represent an often used general sentence “we provide
evidences on a positive/negative relation/association”. Therefore, we choose 50 topics when
implementing LDA and exclude 12 general sentence topics from them.
38
Fig. 2: Rise and Fall of Each Topic Over Years
The figures present the historical evolution of topics’ popularity. The 38 topics are identified
by latent Dirichlet allocation (LDA) model. The methodology of LDA is detailed in Section
4.1. The horizontal axis represents the year of publication. The vertical axis represents the
average percentage distribution on a given topic across abstracts in a given year, and its value
can be interpreted as the topic’s popularity of research interest. It is computed as 𝑝𝑖𝑡 =
∑ 𝑝𝑖𝑡𝑘𝑁𝑡𝑘=1 /𝑁𝑡 where 𝑝𝑖𝑡𝑘 is year-t-published abstract 𝑘’s percentage distribution on topic 𝑖. 𝑁𝑡
is the total number of articles published in year t. For example, the average percentage
distribution on “CEO, Board, Director” across all abstracts rose from about 1.5% in 1980 to
about 2.5% in 2015. We observe that the research interest in “Financial System, Banking Crisis”
often spiked around or after the financial crises, such as the savings and loan crisis in the late
1980s and early 1990s. It grew even faster after the 2008 financial crisis. The research interest
of “CEO, Board, Director” has been growing stably in the past 40 years. Other topics that
attracted more attention include “Behavioral Finance”, “Central Bank, Monetary Policy”,
“Commercial Banking”, “Corporate Cash Holding”, “Hedge Fund, Mutual Fund”,
“International Capital Markets”, “Social Network and Cultural Effect”, “Venture Capital,
Entrepreneurship”, and “Volatility”. The research interest in topics like “Bond Term Structure”
and “Optimal Choice Model” has been shrinking. It is worth noting that high fluctuation of
values exists in the 1970s and 1980s for most of the topics. Fig. A.1 plots the number of active
journals and articles every year. In the 1970s and 1980s, there were fewer journals and articles,
resulting in more volatile values. Fig. A.2 plots the yearly publication numbers in The Journal
of Finance, Journal of Financial Economics, and Review of Financial Studies.
39
40
41
42
Fig. 3: Fastest Growing and Shrinking Topics
Fig. 3.1 reports the fastest growing and shrinking topics from 2006 to 2015 in 17 journals.
Three topics on the left side grow fastest and three topics on the right side shrink fastest. The
topics are identified by latent Dirichlet allocation (LDA) model. The methodology of LDA is
detailed in Section 4.1. The horizontal axis represents the year of publication. The vertical axis
represents the average percentage distribution for a given topic across abstracts in a given year,
and its value can be interpreted as the topic’s popularity of research interest. It is computed as
𝑝𝑖𝑡 = ∑ 𝑝𝑖𝑡𝑘𝑁𝑡𝑘=1 /𝑁𝑡 where 𝑝𝑖𝑡𝑘 is year-t-published abstract 𝑘 ’s percentage distribution on
topic 𝑖. 𝑁𝑡 is the total number of articles published in year t.
43
Fig. 3.2 reports the fastest growing and shrinking topics from 2006 to 2015 in The Journal of
Finance, Journal of Financial Economics, and Review of Financial Studies. Three topics on
the left side grow fastest and three topics on the right side shrink fastest. The topics are
identified by latent Dirichlet allocation (LDA) model. The methodology of LDA is detailed in
Section 4.1. The horizontal axis represents the year of publication. The vertical axis represents
the average percentage distribution for a given topic across abstracts in a given year, and its
value can be interpreted as the topic’s popularity of research interest. It is computed as 𝑝𝑖𝑡 =
∑ 𝑝𝑖𝑡𝑘𝑁𝑡𝑘=1 /𝑁𝑡 where 𝑝𝑖𝑡𝑘 is year-t-published abstract 𝑘’s percentage distribution on topic 𝑖. 𝑁𝑡
is the total number of articles published in year t.
44
Fig. 4: Fastest Growing and Shrinking Topics on Working Papers Uploaded to SSRN’s
Financial Economics Network
This figure reports the fastest growing and shrinking topics from 2006 to 2015 in 130,547
working paper abstracts we collected from SSRN’s Financial Economics Network. Three
topics on the left side grow fastest and three topics on the right side shrink fastest. We apply
the latent Dirichlet allocation (LDA) model trained from published papers on the working
papers. The topics are generated by LDA model. The methodology of LDA is detailed in
Section 4.1. The horizontal axis represents the year of publication. The vertical axis represents
the average percentage distribution for a given topic across abstracts in a given year, and its
value can be interpreted as the topic’s popularity of research interest. It is computed as 𝑝𝑖𝑡 =
∑ 𝑝𝑖𝑡𝑘𝑁𝑡𝑘=1 /𝑁𝑡 where 𝑝𝑖𝑡𝑘 is year-t-posted abstract 𝑘’s percentage distribution on topic 𝑖. 𝑁𝑡 is
the total number of articles published in year t.
45
Fig. 5: Trend of Yearly Research Interest Concentration
This figure reports the trend of published articles’ research interest concentration. To measure
how broad an article is, we calculate the Herfindahl Index of each abstract: 𝐻 = ∑ 𝑠𝑖238
𝑖=1 .
where si represents the percentage distribution of the abstract on topic i. The horizontal axis
represents the year of publication. The vertical axis represents the average Herfindahl Index of
each article on the distribution over the 38 topics in a certain year. The 38 topics are identified
by latent Dirichlet allocation (LDA) model. The methodology of LDA is detailed in Section
4.1. Higher Herfindahl Index value means higher research interest concentration. The solid line
represents the average Herfindahl Index of abstracts in 17 journals, the dashed line represents
the average Herfindahl Index of abstracts in The Journal of Finance, Journal of Financial
Economics, and Review of Financial Studies. The average Herfindahl Index dropped sharply
from 1976 to 1982, perhaps because many topics’ pioneering works started to emerge during
the early period and therefore cross-topic research were more common. The two lines went up
between 1982 and 2000, indicating that on average research becomes narrower. The average
Herfindahl Index of abstracts in 17 journals continued to increase after 2000 while that in the
three top journals tended to remain at a constant level and even declined after 2010, indicating
that the three top journals still publish more broad and cross-topic articles.
0.05
0.052
0.054
0.056
0.058
0.06
0.062
0.064
0.066
0.068
0.07
Her
fin
dah
l In
dex
Year
Research Interest Concentration
All Journals
The Journal of Finance, Journal of Financial Economics, and Review of Financial Studies
46
Fig. 6: Citation Network Between Topics
This figure demonstrates the citation network between finance topics, constructed from cross-
reference data of each article. There are 38 nodes and each of them represents a topic. The 38
topics are identified by latent Dirichlet allocation (LDA) model. The methodology of LDA is
detailed in Section 4.1. A node’s size is proportional to the number of articles that focus on the
topic that the node represents. As defined in Section 5.2, an abstract focuses on a topic if it has
over 10% distribution on it. Topics with more articles have larger nodes. The nodes are
connected through edges. An edge represents the cross-reference between the two topics. An
edge is thicker if there is more cross-reference. For example, if topic A has 𝑁 articles, in total
the 𝑁 articles cite articles in topic B for ∑ 𝑅𝑖𝐵𝑁
𝑖=1 times, where 𝑅𝑖𝐵 is the number of times that
article 𝑖 cites articles in topic B. Similarly, if topic B has 𝑀 articles, in total the 𝑀 articles cite
articles in topic A for ∑ 𝑅𝑗𝐴𝑀
𝑗=1 times, where 𝑅𝑗𝐴 is the number of times that article 𝑗 cites
articles in topic A. Then the total number of cross-reference is ∑ 𝑅𝑖𝐵𝑁
𝑖=1 + ∑ 𝑅𝑗𝐴𝑀
𝑗=1 and is
proportional to the thickness of the edge between A and B. The distance between two nodes
approximately represents how close the two topics are related in terms of cross-reference. We
conduct modularity analysis to categorize the topics into clusters, and 38 topics are
compartmentalized into 5 clusters, or “territories”: asset pricing, corporate finance, market
microstructure, banking and macro finance, and “mixed areas”. Each node’s color reflects the
territory it belongs to. The left side of this figure is clustered with corporate finance topics,
including large topics such as “CEO, Board, Director”, “Mergers and Acquisitions”,
“Shareholder Right, Ownership Structure”, and “IPO”. The bottom side is clustered with
banking and macro finance topics, including large topics such as “Commercial Banking”,
“Central Bank, Monetary Policy”, “Financial System, Banking Crisis”, and “Financial
Regulation”. The right side is clustered with asset pricing topics, including large topics such as
“Option Pricing”, “Volatility”, return distribution and Value-at-Risk (VaR), and bond term
structure. The central side is clustered with market microstructure topics, including large topics
such as “Market Microstructure”, “Trader Behavior”, and “Information Asymmetry,
Disclosure, Insider Trading”. The upper side is clustered with “mixed areas”, including large
topics such as “Hedge Fund, Mutual Fund”, “News, Analyst Report, Earnings Announcement”,
“Behavioral Finance”, and “Statistical Estimation Methodology”.
47
48
Fig. 7: Bibliometric Regularity: Number of Researchers Covering n Topics
This figure presents a bibliometric regularity: the number of researchers covering n topics is
approximately 1/2𝑛 of those covering only one topic. A researcher covers a topic if she
publishes at least one article with over 10% distribution on that topic. The 38 topics are
identified by latent Dirichlet allocation (LDA) model. The methodology of LDA is detailed in
Section 4.1. The horizontal axis represents the number of topics, and the vertical axis represents
the number of researchers. The solid line is generated from our data, which is downward
sloping because fewer researchers are able to cover more topics. The value of each point on
the line indicates how many researchers cover exactly how many topics. For example, the first
point on the solid line is (1, 6830), meaning that 6830 researchers publish articles that focus on
just one topic. The second point is (2, 3507), meaning that 3507 researchers publish articles
that focus on just two topics. We use the dashed line 𝑦 = 13215/2𝑛 to fit the solid line, where
y is the number of researchers covering n topics. When 𝑛 = 1, 𝑦 = 6625.5; when 𝑛 = 2, 𝑦 =
3312.75. The R-squared value of the fitting is 0.998.
0
1000
2000
3000
4000
5000
6000
7000
8000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Nu
mb
er o
f R
esea
rch
ers
(y)
Number of Topics (n)
Number of Researchers Covering n Topics
Empirical 13215/2ⁿ
49
Table A.1: Words that Appear in a Phrase at High Frequency
This table reports words that appear in a phrase at high frequency. Words presented here are
processed by stemming. For the last two 3-word set “chief executive officer” and “chief
executive officer” we combine them into “ipo” and “ceo” respectively in our textual data
cleaning.
interest rate limit order
unit state fama french
exchang rate foreign exchang
cross section impli volatil
real estat advers select
cash flow cross border
monetari polici capit structur
mutual fund price discoveri
bid ask brownian motion
mont carlo random walk
short term deposit insur
black schole emerg market
abnorm return institut investor
time seri short run
transact cost risk neutral
risk avers financi distress
time vari yield curv
term structur agenc cost
inform asymmetri feder reserv
corpor govern asymmetr inform
financi crisi cross list
hedg fund ventur capitalist
busi cycl standard deviat
moral hazard tender offer
central bank initi public offer
hong kong chief execut offic
balanc sheet
Words Commonly Appearing Together
50
Table A.2: High-frequency Keywords in General Sentence Topics
This table presents 12 topics that represent general sentences. For example, a topic with keywords “relat”, “posit”, “neg”, “associ” and “evid” may
simply represent an often used general sentence “we provide evidences on a positive/negative relation/association”. The topics are identified by
latent Dirichlet allocation (LDA) model. The methodology of LDA is detailed in Section 4.1.
No. 1 2 3 4 5 6 7 8 9 101 valu discount econom show present base fundament journal number multipl
2 test adjust run hypothesi data power statist reject mean deviat
3 relat posit neg level associ signific evid examin consist document4 approach framework propos appli properti discuss present methodolog analysi practic
5 differ import role determin type across play structur rel characterist
6 larg small averag size year point sampl number rel period
7 empir evid support theori predict provid consist hypothesi theoret explan8 increas chang decreas declin follow shift becom reduc experi rise9 research studi may literatur previou recent due exist suggest argu
10 time period data set analysi studi observ show continu provid
11 perform measur base sampl indic compar improv better differ studi12 effect impact studi signific affect show examin investig lead direct
51
Table A.3: Most Cited Articles in Each Topic (Google Scholar)
This table lists the most cited articles in each topic. The 38 topics are identified by latent Dirichlet allocation (LDA) model. The methodology of
LDA is detailed in Section 4.1. The year of publication is in the parenthesis. The author name or the names of coauthors are before the parenthesis.
The number of citation is behind the parenthesis. We present Google Scholar citation behind the author-years.
No. Topic 1 2 3 4 5
1 Option Pricing Heston (1993), 6953 Cox, Ross, Rubinstein (1979), 7371 Vasicek (1977), 6669 Merton (1976), 5694 Cox, Ross (1976), 3547
2 Commercial Banking Sharpe (1990), 2369 Barth, Caprio, Levine (2004), 2394 Boot (2000), 2205 Petersen, Rajan (2002), 1539 Berger, Miller, Petersen, Rajan, Stein (2005), 1503
3 CEO, Board, Director Yermack (1996), 6323 Weisbach (1988), 4846 Core, Holthausen, Larcker (1999), 4007 Amit, Villalonga (2006), 2814 Agrawal, Knoeber (1996), 3345
4 Market Microstructure Lee, Ready (1991), 2801 Copeland, Galai (1983), 2184 Hamao, Masulis, Ng (1990), 2210 Glosten, Harris (1988), 1730 Huang, Stoll (1996), 1125
5 Central Bank, Monetary Policy Eun, Shim (1989), 1778 Meese, Rogoff (1988), 939 Sercu, Uppal, van Hulle (1995), 458 Blanchard, Galí (2007), 703 Thorbecke (1997), 831
6 Mergers and Acquisitions Jensen, Ruback (1983), 6021 Morck, Shleifer, Vishny (1990), 2375 Bradley, Desai, Kim (1988), 2038 Moeller, Schlingemann, Stulz (2004), 1756 Shleifer, Vishny (2003), 2024
7 Return Distribution and Value-at-Risk (VaR) Rockafellar, Uryasev (2002), 2589 Cont (2001), 2004 Longin, Solnik (2001), 2316 Rubinstein (1994), 2038 Jackwerth, Rubinstein (1996), 1167
8 News, Analyst Report, Earnings Announcement Barberis, Shleifer, Vishny (1998), 4650 Fama, French (1995), 3669 Ikenberry, Lakonishok, Vermaelen (1995), 2047 Teoh, Welch, Wong (1998), 2412 Womack (1996), 1768
9 Hedge Fund, Mutual Fund Carhart (1997), 11220 Sirri, Tufano (1998), 2783 Daniel, Grinblatt, Titman, Wermers (1997), 2262 Wermers (1999), 1718 Wermers (2000), 1610
10 Shareholder Right, Ownership Structure Shleifer, Vishny (1997), 16697 La Porta, Lopez-de-Silanes, Shleifer (1999), 11945 La Porta, Lopez-de-Silanes, Shleifer, Vishny (1997), 9199 Claessens, Djankov, Lang (2000), 6107 La Porta, Lopez-de-Silanes, Shleifer, Vishny (2000), 6535
11 International Capital Markets Bekaert, Harvey (1995), 2409 Coval, Moskowitz (1999), 2054 Bekaert, Harvey (2000), 1895 Claessens, Demirgüç-Kunt, Huizinga (2001), 2061 Harvey (1995), 1663
12 IPO Loughran, Ritter (1995), 3998 Ritter (1991), 4494 Carter, Manaster (1990), 2761 Rock (1986), 3296 Megginson, Weiss (1991), 2451
13 Capital Structure, Bankruptcy, Leverage Smith, Warner (1979), 3445 Rajan (1992), 3737 Titman, Wessels (1988), 6004 Leland (1994), 2612 Deangelo, Masulis (1980), 3150
14 Macro Finance Schwert (1989), 3597 Estrella, Hardouvelis (1991), 1457 Constantinides, Ferson (1991), 644 Blanchard, Galí (2007), 703 McCallum, Nelson (1999), 733
15 Volatility Glosten, Jagannathan, Runkle (1993), 7183 Engle, Ng (1993), 4030 Andersen (2001), 1954 Pan (2002), 1424 Campbell, Hentschel (1992), 1772
16 Default and CDS Jarrow, Lando, Turnbull (1997), 1903 Longstaff, Mithal, Neis (2005), 1667 Blanco, Brennan, Marsh (2005), 1089 Bharath, Shumway (2008), 908 Crouhy, Galai, Mark (2000), 1327
17 Commodities, Futures Black (1976), 3201 Schwartz (1997), 2001 Gibson, Schwartz (1990), 1004 Fama (1984), 838 Stoll, Whaley (1990), 1030
18 Trader Behavior Admati, Pfleiderer (1988), 3528 Brunnermeier, Pedersen (2009), 3155 French, Roll (1986), 2173 Easley, O'Hara (1987), 2328 de Long, Shleifer, Summers, Waldmann (1990), 2675
19 Bond Term Structure Vasicek (1977), 6669 Fama, French (1989), 3611 Chan, Karolyi, Longstaff, Sanders (1992), 2111 Longstaff, Schwartz (1995), 2514 Leland (1994), 2612
20 Determinants of Stock Return Jegadeesh, Titman (1993), 8652 Fama, French (1996), 6375 Debondt, Thaler (1985), 7304 Amihud (2002), 5462 French, Schwert, Stambaugh (1987), 3995
21 Asset and Portfolio Allocation Demiguel, Garlappi, Uppal (2009), 1392 Jagannathan, Ma (2003), 875 Best, Grauer (1991), 712 Chopra, Ziemba (1993), 1031 Kim, Omberg (1996), 715
22 Asset Pricing Model Breeden (1979), 2882 Stulz (1981), 863 Diamond, Verrecchia (1981), 720 Breeden, Gibbons, Litzenberger (1989), 780 Sundaresan (1989), 560
23 Financial Regulation Barth, Caprio, Levine (2004), 2394 Karpoff, Lee, Martin (2008), 610 Marcus (1984), 607 Buser, Chen, Kane (1981), 641 Dahl, Shrieves (1992), 720
24 Statistical Estimation Methodology Petersen (2009), 6349 Barber, Lyon (1997), 2908 Dimson (1979), 2234 Stambaugh (1999), 1283 Hodrick (1992), 1180
25 International Asset Pricing and Foreign Exchange Hamao, Masulis, Ng (1990), 2210 Dittmar, Neely, Weller (1997), 674 Peel, Taylor (2000), 560 Lins, Servaes (1999), 672 Cheung, Chinn (2001), 514
26 Venture Capital, Entrepreneurship Sahlman (1990), 3117 Hellmann, Puri (2002), 1892 Hellmann, Puri (2000), 1321 Hsu (2004), 988 Gompers (1995), 2147
27 Industry Competition and Market Efficiency Claessens, Laeven (2004), 1193 Klapper, Laeven, Rajan (2006), 930 Berger, Deyoung (1997), 1353 Bonin, Hasan, Wachtel (2005), 1140 Gold, Sherman (1985), 1102
28 Tax Shefrin, Statman (1985), 2981 Lakonishok, Shleifer, Vishny (1992), 2019 Claessens, Demirgüç-Kunt, Huizinga (2001), 2061 Grinblatt, Keloharju (2001), 1140 Miller, Scholes (1978), 990
29 Financial System, Banking Crisis Rajan, Zingales (2003), 2809 Beck, Levine, Loayza (2000), 3469 Allen, Qian, Qian (2005), 2620 Hoshi, Kashyap, Scharfstein (1990), 1410 Faccio, Masulis, McConnell (2006), 1317
30 Multifactor Model Fama, French (1993), 19549 Fama, French (1992), 16423 Jagannathan, Wang (1996), 2369 Harvey, Siddique (2000), 1902 Daniel, Titman (1997), 2019
31 Dividend Policy La Porta, Lopez-de-Silanes, Shleifer, Vishny (2000), 2622 Fama, French (2001), 2779 Fama, French (2002), 3084 Brav, Graham, Harvey, Michaely (2005), 1817 Grullon, Michaely (2002), 1310
32 Information Asymmetry, Disclosure, Insider Trading Diamond, Verrecchia (1991), 2950 Easley, O'Hara (2004), 2381 Seyhun (1986), 1383 Froot, Scharfstein, Stein (1992), 1122 Blume, Easley, O'Hara (1994), 1218
33 Optimal Choice Model Grossman, Hart (1988), 1740 Admati, Pfleiderer (1994), 953 Harris, Raviv (1988), 815 Jorion (1986), 775 Kroll, Levy, Markowitz (1984), 554
34 Corporate Operational Struture and Value Creation Morck, Shleifer, Vishny (1988), 9026 Claessens, Djankov, Lang (2000), 6107 Almeida, Campello, Weisbach (2004), 2169 Campa, Kedia (2002), 1484 Coles, Daniel, Naveen (2008), 1937
35 Systematic Risk and Risk Premium Harvey, Siddique (2000), 1902 Acerbi, Tasche (2002), 1406 Harvey (1991), 1262 Laeven, Levine (2009), 1487 Ferson, Harvey (1993), 1135
36 Behavioral Finance Odean (1998), 3372 Shefrin, Statman (1985), 2981 Barber, Odean (2008), 2226 Grinblatt (2000), 1469 Lee, Shleifer, Thaler (1991), 1961
37 Corporate Cash Holding Opler (1999), 2694 Harford (1999), 1635 Bates, Kahle, Stulz (2009), 1503 Harford, Mansi, Maxwell (2008), 1151 Dittmar, Mahrt-Smith, Servaes (2003), 1130
38 Social Network and Cultural Effect Hong, Kubik, Stein (2004), 1165 Boss, Elsinger, Summer, Thurner (2004), 555 Hong, Kacperczyk (2009), 692 Blinder, Morgan (2005), 259 Brown, Ivković, Smith, Weisbenner (2008), 341
52
Table A.4: JEL Codes
This table lists the explanation of JEL codes used in Table 4.
C1
C14 Semiparametric and Nonparametric Methods: General
C2
C22Time-Series Models • Dynamic Quantile Regressions • Dynamic Treatment Effect
Models • Diffusion Processes
C3
C32Time-Series Models • Dynamic Quantile Regressions • Dynamic Treatment Effect
Models • Diffusion Processes • State Space Models
C5
C53 Forecasting and Prediction Methods • Simulation Methods
C6
C61 Optimization Techniques • Programming Models • Dynamic Analysis
C63 Computational Techniques • Simulation Modeling
D1
D12 Consumer Economics: Empirical Analysis
D2
D24Production • Cost • Capital • Capital, Total Factor, and Multifactor Productivity •
Capacity
D8
D81 Criteria for Decision-Making under Risk and Uncertainty
D82 Asymmetric and Private Information • Mechanism Design
E3
E31 Price Level • Inflation • Deflation
E4
E43 Interest Rates: Determination, Term Structure, and Effects
E5
E52 Monetary Policy
E58 Central Banks and Their Policies
F2
F21 International Investment • Long-Term Capital Movements
F3
F31 Foreign Exchange
F32 Current Account Adjustment • Short-Term Capital Movements
F36 Financial Aspects of Economic Integration
F4
F41 Open Economy Macroeconomics
C. Mathematical and Quantitative Methods
D. Microeconomics
E. Macroeconomics and Monetary Economics
F. International Economics
International Finance
Macroeconomic Aspects of International Trade and Finance
Econometric and Statistical Methods and Methodology: General
Single Equation Models • Single Variables
Multiple or Simultaneous Equation Models • Multiple Variables
Econometric Modeling
Mathematical Methods • Programming Models • Mathematical and Simulation Modeling
Household Behavior and Family Economics
Production and Organizations
Information, Knowledge, and Uncertainty
International Factor Movements and International Business
Prices, Business Fluctuations, and Cycles
Money and Interest Rates
Monetary Policy, Central Banking, and the Supply of Money and Credit
53
Table A.4 JEL Codes (Continued)
G01 Financial Crises
G1
G10 General
G11 Portfolio Choice • Investment Decisions
G12 Asset Pricing • Trading Volume • Bond Interest Rates
G13 Contingent Pricing • Futures Pricing
G14 Information and Market Efficiency • Event Studies • Insider Trading
G15 International Financial Markets
G18 Government Policy and Regulation
G2
G20 General
G21 Banks • Depository Institutions • Micro Finance Institutions • Mortgages
G22 Insurance • Insurance Companies • Actuarial Studies
G23 Non-bank Financial Institutions • Financial Instruments • Institutional Investors
G24 Investment Banking • Venture Capital • Brokerage • Ratings and Ratings Agencies
G28 Government Policy and Regulation
G3
G30 General
G31 Capital Budgeting • Fixed Investment and Inventory Studies • Capacity
G32Financing Policy • Financial Risk and Risk Management • Capital and Ownership
Structure • Value of Firms • Goodwill
G33 Bankruptcy • Liquidation
G34 Mergers • Acquisitions • Restructuring • Corporate Governance
G35 Payout Policy
G38 Government Policy and Regulation
J3
J33 Compensation Packages • Payment Methods
M4
M41 Accounting
G. Financial Economics
J. Labor and Demographic Economics
M. Business Administration and Business Economics • Marketing • Accounting • Personnel
Economics
General Financial Markets
Financial Institutions and Services
Corporate Finance and Governance
Wages, Compensation, and Labor Costs
Accounting and Auditing
54
Fig. A.1: Yearly Journal and Article Numbers
This figure reports the number of journals and the total number of all articles in our sample
every year. The blue bars represent the number of journals. The orange line represents the total
number of all articles. We exclude articles without abstracts.
0
2
4
6
8
10
12
14
16
18
0
200
400
600
800
1000
1200
1400
1600
No
.Jo
urn
als
No
.Art
icle
s
Year
Number of Journals and Articles
No. Journals No. Articles
55
Fig. A.2: Yearly Publication Numbers
This figure reports the yearly publication numbers of The Journal of Finance, Journal of
Financial Economics, and Review of Financial Studies in our sample from 1976 to 2014. We
exclude articles without abstracts.
0
20
40
60
80
100
120
140
160
180
Nu
mb
er
Year
Yearly Publication Numbers
The Journal of Finance Journal of Financial Economics Review of Financial Studies