Upload
dinhcong
View
215
Download
1
Embed Size (px)
Citation preview
Modeling Oral Business History Data: An Application to Markets and CEO Communication
Prithwiraj (Raj) Choudhury Natalie A. Carlson
Dan Wang Tarun Khanna
Working Paper 18-064
Working Paper 18-064
Copyright © 2018 by Prithwiraj (Raj) Choudhury, Dan Wang, Natalie A. Carlson, and Tarun Khanna
Working papers are in draft form. This working paper is distributed for purposes of comment and discussion only. It may not be reproduced without permission of the copyright holder. Copies of working papers are available from the author.
Modeling Oral Business History Data: An Application to Markets and CEO Communication
Prithwiraj (Raj) Choudhury Harvard Business School
Natalie A. Carlson Columbia University
Dan Wang Columbia University
Tarun Khanna Harvard Business School
1
Modeling oral business history data:
An application to markets and CEO communication
Prithwiraj (Raj) Choudhury, Dan Wang, Natalie A. Carlson and Tarun Khanna
We shed light on oral business history data, widely available and yet underutilized, as a relevant and useful resource for strategy and management scholars. Specifically, we outline a novel methodology based on topic modeling and sentiment analysis to illustrate that such data can be used to generate insight about the relationship between a firm’s past financial performance and the language and tone of speeches by the firm’s CEO. Using 88 CEO interviews conducted between 2008 and 2017, archived at the publicly accessible Harvard Business School online repository titled “Creating Emerging Markets”, we employ our unique methodology to study how environmental factors such as unexpected market returns are related to the range of topics and sentiments expressed by CEOs’ spoken words. In doing so, we match the data from the CEO interviews with financial performance data for the set of firms led by those CEOs as well as market performance data from the countries in which these firms are headquartered. Our results suggest that greater abnormal returns just prior to the date of the CEO interview are positively correlated with certain emotions expressed by CEOs, such as surprise, fear, and anticipation, an indication that unanticipated firm performance heightens the emotional states of firm leadership, with possible consequences for subsequent strategy choices. Furthermore, greater abnormal returns also are positively correlated with a CEO’s tendency to focus their speech more on work-related topics rather than personal topics.
[242 words]
2
Introduction
There is a rich tradition of strategy and international business (IB) scholars of engaging with
historical data (Dunning 1998, Vernon 1966, Kogut 1993, O’Sullivan 2001, Jones 2005).
Scholars such as Raymond Vernon, John Stopford and Geoffrey Jones were deeply influenced by
the work of business historians such as Mira Wilkins (Wilkins 1970, 1974). However, the
systematic investigation of historical evidence has since disappeared from the more recent
research agendas of strategy and IB scholars. According to Jones and Khanna (2006), historical
data can provide a meaningful complement to other cross-sectional analyses in studying many
conceptual issues that can shape the development of new theory in strategy research. In this
paper, we develop their argument and illuminate a novel methodology involving oral history
data, topic modeling, and sentiment analysis. In particular, we argue that oral history represents
a rich data source for strategy research that has been underexploited because of the absence of a
robust methodology to utilize such data. We develop and apply our approach to study a key
strategy question - how do contextual economic factors affect the topical and emotional content
of CEO communication?
Oral history data has been extensively used by historians, and business historians in
particular (Portelli 2009; Thomson 2017). However, to the best of our knowledge, the use of oral
history data has not permeated into the world of strategy scholarship. This is despite the
availability of rich oral business history archives from multiple public databases. For example,
even as early as the 1970s and 1980s, extensive collections of oral history data concerning
American business activity have been preserved and made available, including the oral history
project initiated by Ford Motor Company in collaboration with Columbia University and similar
projects at Sears, Roebuck and Company, Atlantic Richfield, and other large corporations
3
(Saretzky 1981).1 Although oral history data has been instrumental as a way of illuminating the
events of the past in historical narratives, one of the stumbling blocks in using oral history data
for strategy scholarship has been the absence of a robust methodology outlining how oral history
data might be used to inform the analysis of strategy research questions.
In this paper, we employ unique oral history data from the Harvard Business School
curated archive of interviews with CEO titled “Creating Emerging Markets”. The archive
consists of a unique collection of oral history transcripts of interviews with the CEOs of 88
unique firms, recorded by researchers at the Harvard Business School between years 2008 and
2017. Part of this data was used in a recent paper that employed inductive-deductive theory
building methods to study how firms, competing in emerging markets, are able to survive over
the long term (Gao et al., 2017). Whereas Gao, et al (2017) develop insights through traditional
qualitative analysis, we use utilize the same data to demonstrate a new and complementary
quantitative approach to qualitative data. Specifically, we outline and employ a novel
methodology based on state-of-the-art topic modeling to generate dependent variables related to
the dispersion of topics and sentiment of the content discussed by CEOs in these interviews.
Our choice of topic modeling follows recent strategy scholarship that has increasingly
used the technique to take advantage of the availability of text-based data about firm activity
(Kaplan and Vakili 2015). In short, topic modeling offers a systematic way of quantitatively
measuring the prevalence or distribution of some group of topics that describe the distribution of
content of a set of documents in the form of sets of keywords. We apply topic modeling to oral
history interview transcripts associated with a set of business leaders in emerging economies.
1 As the author notes, even as far back as in 1979, Theresa McHugh Palmersheim, graduate student at the American University, found that only one-fifth of the 500 largest American firms had archives and out of the fifty-five firms responding to her questionnaire, twenty indicated the inclusion of oral histories.
4
Because the textual structure of oral history interview transcripts is unique, one of our main
contributions is to describe a replicable approach for preparing such data for analysis. We argue
that with ours or similar approaches, strategy scholarship can benefit from exploiting rich
databases that contain interview-based oral business history to explore how firm level variables
might affect an executive’s interpretation of her leadership role and influence in a firm.
To demonstrate our approach with a relevant empirical question, we employ our novel
methodology and data to shed light on a question that strategy scholars have long studied: how
are contextual economic factors and environmental uncertainty related to the topical and
emotional content of CEO communication? Past work has theorized about a link between the
content and tone of a CEO’s communication and the immediate economic and environmental
factors that have shaped the performance of a CEO’s firm (Lefebvre et al., 1997; D’Aveni and
MacMillan, 1990). More broadly, our study also relates to other work that has focused on factors
that influence the cognitive frames of top managers (Tripsas and Gavetti, 2000; Hambrick 2007;
Kaplan, 2008), as well as the wider literature on situated attention and the environment, within
which managers make decisions (Barnard 1938; Ocasio, 1997).
Given that the focus of our paper is methodological – that is, we introduce a novel
method to incorporate oral history data in strategy research – we do not claim to provide causal
empirical results around our research question. Instead, we attempt to correlate the dependent
variables related to CEO communication that reveal the emotional state of our CEOs and their
attention to work or personal topics that we create through topic models of oral history data, with
independent variables constructed through an event study analysis based on firm level abnormal
returns to stock price (Jacobsen 1988; Fama and French 1993; Barber and Lyon 1997).
5
We argue that understanding the factors that shape a CEO’s attention and emotional state
– which can be revealed in an analysis of their language in their own spoken words – is an
important complement to existing work that links a CEO’s emotions and subjective state of mind
to their imminent strategic choices. Namely, Delgado-Garcia and De La Fuente-Sebate (2010)
argue that CEOs who display positive long-term affective traits, such as optimism or
anticipation, tend to associated with strategic choices that conform less to industry standards.
Related studies have also investigated the relationship between certain aspects of a CEO’s
personality, such as narcissism, extraversion, or openness, on subsequent strategic decision-
making (Chatterji, et al 2010, Herrman and Hadkarni 2013, Gamache, et al 2015, Hiller and
Hambrick 2005).
Our results show a remarkably robust relationship between the emotions and type of
content discussed by a CEO and the broader economic factors that have recent shaped the
performance of the CEO’s firm. Specifically, larger cumulative abnormal returns for a CEO’s
firm are associated with greater anticipation, surprise, and fear in a CEO’s language in the
interview transcripts we analyze. In other words, unexpected shocks to firm performance can
position CEOs in an elevated emotional state. In addition, greater cumulative abnormal returns
also predict that CEOs concentrate more on work-related topics in their interviews, indicating
that unexpected positive patterns in their firms’ recent performance can influence CEO attention.
Importantly, the transcript data we use come from open-ended, semi-structured interviews; thus,
the topics that the CEOs in our dataset decide to address in the course of an interview are
entirely their own decisions. In summary, our findings illustrate a vital, but under-examined,
pathway through which unanticipated changes in a firm performance might affect its future
strategy: namely, exogenous environmental factors that affect a firm’s performance in
6
unexpected ways arguably produces shifts in the firm’s strategy because they can provoke
changes in the emotional state and attention of the firm’s CEO.
Our analytical approach to making sense of oral histories based on transcripts of CEO
interviews has several methodological advantages over more conventional means of interpreting
and summarizing written communication data. Because we adopt a systematic quantitative
approach to measuring the emotional and topical content of a CEO’s oral communication, our
findings are highly replicable in other datasets. Moreover, our results are not contingent on the
unobserved subjective biases of human coding, which has conventionally been used to prepare
qualitative data for analysis. Similarly, because our method relies on machine learning, our
approach to analyzing the intricacies of text-based data reduces the often high cost and effort
imposed by human coding. Thus, we lower the barrier to entry for future researchers who might
wish to adapt our techniques for exploiting similar data.
Our methodology and results are relevant for several literatures in strategy, notably the
nascent literature that argues for the incorporation of historical data into strategy research (Jones
and Khanna, 2006; Gao, et al 2017). Our methodology and results are also relevant to the
literature that focuses on CEO attention (Daft et al., 1988; Calori et al., 1994; Lefebvre et al.,
1997; Yadav et al., 2007), the broader literature on managerial attention (Ocasio, 1997, 2011),
managerial interpretation and managerial cognition (Barr 1998; Tripsas and Gavetti, 2000;
Kaplan, 2008). Given that CEOs spend a large share of their time communicating within and
outside the organization, we also add to the literature on how CEOs spend their time (Porter and
Nohria, 2010; Bandiera et al., 2017). Finally, by highlighting the influence of contextual factors
on the quality and content of qualitative data, we also make a contribution to the long-standing
7
tradition focused on using qualitative data in strategy research (Hatch and Schultz 2017, Corbin
and Strauss 2008, Eisenhardt 1991, Barley 1990).
Theory: CEO Communication and Contextual Factors
The Chief Executive Officer (CEO) occupies arguably the most central and important leadership
role at any firm, being principally charged with the obligation to set firm strategy (Hambrick and
Mason, 1984). One of the most important ways that a CEO might influence firm strategy is by
communicating her ideas to internal and external stakeholders (D’Aveni and MacMillan, 1990;
Lefebvre et al., 1997; Yadav et al., 2007).
Starting with Mintzberg (1987), firm strategy has been characterized as abstractions in
the minds of managers. Calori et al. (1994) characterized the CEO as a “cognizer”, an individual
who integrates views in the top management team and communicates the integrated view to
internal and external stakeholders. Building on this work, the subsequent strategy literature has
studied the effect of CEO communication on firm level outcomes. In a relatively recent study,
Yadav et al. (2007) coded CEO communication using letters to shareholders that were featured
in firms’ annual reports.2 Using these data, the authors showed that certain features of CEO
communication – specifically having greater internal and external focus – can have a “positive
and long term impact on how firms detect, develop and deploy new technologies over time”
(Yadav, et al 2007: 84). Other more recent work has utilized similar data, such as CEOs’ written
diaries, to investigate how executives divide their attention among various firm and personal
activities (Bandiera, et al 2013).
2 The authors used data from the letters from the CEO to shareholders to code several explanatory variables such as “future focus” (coded using the frequency of the word ‘will’ in the letters); “external focus” (coded using the frequency of words in the letters, denoting outward attention to customers and competitors) and “internal focus” (coded using the frequency of words in the letters denoting the inward attention to organization specific issues).
8
A related literature in strategy and organizations studies how contextual and
environmental factors relate to CEO communication. In the strategy literature, Hambrick and
Macmillan (1985) stated that “context refers to the environment and broad organizational milieu
in which the innovative attempt is situated” (Hambrick and MacMillan, 1985: 529). Hambrick
and MacMillan (1985) take cues from Duncan (1972), who suggested that the “environment”
consists of the relevant physical and social factors outside the boundary of an organization that
are taken into consideration during organizational decision making (Daft et al., 1988: 124).
Specifically, Daft et al. (1988) argue that contextual uncertainty increases information processing
within firms and, faced with this, the CEO must identify and interpret problems and
opportunities and accordingly communicate strategic adaptations to changing environmental
conditions.
Lefebvre et al. (1997) extend this line of reasoning one step further to argue that it is the
CEO’s perceptions of the environment, not the objective realities of the environment that shapes
CEO communication and firm strategy. Here, they introduce the term “prism effect” to describe
how the objective realities of the external environment are shaped by the personal biases of the
CEO. This “prism effect”, in turn, influences CEO communication and acts as a moderator for
how a firm’s technology policy might affect its realized innovative efforts. In the same literature,
D’Aveni and MacMillan (1990) compared senior managers’ letters to shareholders during
demand-decline crises for 57 bankrupt firms and 57 matched survivors. The authors found that
under environmental uncertainty, not only do the CEOs of surviving firms pay disproportionate
attention to the output environment of the firm, but their communication to shareholders also
more strongly reflect these structural differences in their attention.
9
In many of the studies we have reviewed on CEO communication and CEO attention, the
workhorse methodological tool has been content analysis of written communication by the CEO.
More broadly, the use of content analysis of written communication, such as CEO letters to
stakeholders, has a long tradition of usage in the strategy research (Watzlawick et al., 1967;
Salancik and Meindl, 1984). Although written communication has several strengths that lend
well to content analysis, a glaring omission is the absence of analysis using oral communication
records by the CEO.3 Pfeffer (1981) argued that the content analysis of language helps provide
evidence of the origination of problems within the organization and organizational responses to
such problems. Arguably both written and oral communication should reflect the perceived
realities of CEOs, as articulated by Lefebvre et al. (1997). In addition, in a recent paper, Helfat
and Peteraf (2015) outline several characteristics of oral communication by CEOs and the effect
such communication might have on individual workers and firm strategy:
The communication style of top managers in general, and the way in which they communicate a vision for the organization in particular, can inspire workers, encourage initiative, and drive entrepreneurial growth (Baum, Locke, and Kirkpatrick, 1998; Wesley and Mintzberg, 1989). Managerial skill in using language, such as through impromptu talks, flow of words, and articulation in conversation, may affect worker response to change initiatives (Helfat and Peteraf, 2015; page 843). Despite the realization of the value of data embedded in CEOs’ oral communication, to
the best of our knowledge, there are no studies in the strategy literature that use the oral
communication records of CEOs as a main data source to better understand the relationship
between CEO communication, attention, and firm outcomes. We suggest that this rich data
source has been underexploited by strategy scholars in the absence of a robust methodology to
3 However, some work in accounting has examined firms’ tendency to engage shareholders through quarterly conference calls, often drawing upon the transcripts of some of those calls, which have been analyzed using, among other methods, topic modeling as well. (Tasker 1998, Lehavy, et al 2011, Larcker and Zakolyukina 2012, Huang, et al Forthcoming). For a more comprehensive review in the accounting literature, please see Loughran and McDonald (2016).
10
utilize such data. To address this oversight, in the following sections we develop a novel
methodology to utilize oral history data (oral interview data of CEOs) and use these data to study
the effect of contextual factors on CEO communication.
Unique Features of Oral Business History Transcripts
Our primary data come from oral histories in the form of interview transcripts with
business leaders. Oral histories occupy a unique space as far back as historical data sources go.
Most primary source data, such as book-keeping notes or census records, are sought by historians
because they form a legitimate basis for describing objective reality from the past. However, the
value of oral history is precisely the opposite. Because oral histories are collected by recording
an individual’s thoughts in a certain context about certain topics, even when reconstructing a
narrative of events, subjectivity is a salient and inherent feature of oral history data. Transcripts
of oral histories reflect a speaker’s thoughts and biases “in the moment” of their delivery
whereas written communication records – such as letters to shareholders – are deliberately
planned, revised, and are often influenced by the different stakeholders involved in drafting a
written message. According to Portelli (1981: 100-01), oral histories reveal the dispositional
states of speakers at a given point in time, often in relation to “not just what people did, but what
they wanted to do, what they believed they were doing, what they now think they did.” In other
words, an oral history offers a glimpse into the psychological stakes of a particular memory that
a speaker recalls.
Aside from the value of subjectivity, oral histories from business leaders, in particular,
are a particularly useful resource for organizational scholars because not only, they can reveal
insight into the events and activities leading up to important business decisions – which are often
11
undocumented – but a speaker’s interpretation and feelings about those events. Keulen and
Kroeze (2012) identify three unique advantages of using oral history data from organizational
leaders for piecing together and making sense of business history. Namely, they characterize
“oral business history” as 1) “archival”, 2) “scientific”, and 3) “democratic”. We describe each
of these features below, arguing that each of these features makes oral business history data
attractive as a resource for strategy scholarship.
Oral business history, itself, refers to oral histories about the activities of business
organizations (Perks 2010). The analysis of oral history data primarily comes in the form of the
content analysis of the transcripts associated with oral records. Because many institutional
research efforts to collect oral history data more broadly have been conducted with the aim of
giving voice to strata of society that are typically oppressed, oral business histories in particular
are relatively rare as a data source because few historians have sought to document the oral
statements of the elites that are seen as comprising the voices of business activities. As such,
Keulen and Kroeze (2012) argue that oral business histories are uniquely “archival” because they
are often the only way to garner information about why certain decisions were made, given that
many firm decisions often lack paper trails or other documentation.
In addition, oral business histories are also “scientific” as data sources because they are
often collected via a semi-structured or unstructured interview with the leaders of an
organization. Therefore, Keulen and Kroeze (2012) use the term “scientific” to describe the
validity of oral history data as an accurate rather than false representation of a speaker’s
impressions and thoughts, given that speakers are given free reign to express themselves. The
execution of a relatively open-ended interview necessitates that an interviewer conduct
background research on the interviewee subject, which gives the impression that both the
12
interviewer and interviewee should take the conversation seriously, thus influencing interviewees
to speak more candidly during a recording.
Finally, business oral histories are “democratic” because although business leaders often
appear in the public spotlight, given that they are beholden to multiple powerful stakeholders,
business leaders also often feel that they cannot speak freely (Abrams 2010: 161). As such, oral
histories that are drawn from interviews with executives have a tendency to liberate the personal
sentiments and thoughts of business leaders who might find it lonely at the top. In other words,
CEOs and other business leaders often see an “oral history interview as an opportunity to tell
their side of the story” (Keulen and Kroeze 2012).
Oral business histories are therefore ideally suited as a data source for exploring how
contextual factors affect the communication style and content of CEOs. Unlike official written
communications from CEOs, which convey the premeditated and largely edited views of a firm’s
leadership, oral business histories offer insight into the subjectivity of CEO communication and
language, which can arguably affect how CEOs make decisions. As a result of their candidness,
oral histories collected through interviews are arguably one of the most valuable sources of data
on a CEO’s sensitivities, emotions, tastes, and opinions, but one that is also vastly under-
exploited.
In addition, we emphasize the value of oral histories related to business decisions because
an enormous institutional effort has been undertaken to make such data freely available and
accessible. Table 1, for example, reports a list of selected archives, mostly housed in university
libraries that contain a diverse array of interviews with business leaders, covering a wide range
of industries, regions, and topics. One notable resource is Columbia University’s Oral History
Archive, which has been widely acknowledged as the largest searchable database of oral history
13
records in the world, giving access not only to audio and video records of interviews with
business executives, but also their accompanying transcripts as well. Given the rich tradition in
strategy research of forming explanations of firm behavior based on an understanding of the
dispositional or emotional state of firm leadership, we argue that it is imperative for strategy
scholars to exploit these archives to inform novel theory building.
--------------------------------------------------------------------------------- INSERT TABLE 1 ABOUT HERE
---------------------------------------------------------------------------------
Importantly, we make a distinction between the analysis of transcripts of oral business
histories and the analysis of the audio and visual content of oral business histories. In analyzing
transcripts of interviews with CEOs below, we focus on the explicit language used by
interviewees in sharing their thoughts, but we cannot incorporate measurements of physical
appearances or aural indicators, which also contain information about a CEO’s emotional state or
attention. Sociolinguists and computer scientists have developed means for analyzing features of
spoken speech, such as prosody and pitch, correlating them with the emotional variation and
meaning. Deploying similar techniques offers a promising pathway for future researchers to take
even greater advantage of oral business history records. However, to bring greater focus to our
analysis, we limit the scope of our methodological description and demonstration to transcripts.
Data: ‘Creating Emerging Markets’ Oral History Archives at Harvard Business School
Our oral history data come from the ‘Creating Emerging Markets’ archive, which
contains video recordings of interviews with 88 CEOs of multinational companies headquartered
in emerging economies. The interview transcripts, which we used as our primary input data,
were downloaded from the interviews section of the Harvard Business School’s “Creating
14
Emerging Markets” initiative.4 The objective of this initiative is to analyze the evolution of
business leadership in Africa, Asia and Latin America.5 As the curator of the oral history
archives describes in the website – “Creating Emerging Markets explores the evolution of
business leadership in Africa, Asia, and Latin America.
At its core are interviews, many on video and by Harvard Business School faculty
members, with leaders or former leaders of businesses and NGOs. These interviews, with men
and women of diverse backgrounds, address pivotal moments of transition in their organizations.
They contain compelling insights on entrepreneurship, innovation, family business, and the
globalization of firms and brands. Emphasizing ways that businesses can create value for their
societies, the project provides a unique resource for research and teaching. According to the
website for the archive (see footnote 5), it was envisioned as a public goods project, designed to
be available to scholars and educators worldwide.
The interviews themselves were conducted by Harvard Business School faculty with the
leaders or former leaders of businesses and non-governmental organizations and particularly
focus on significant moments of transition in their regions. Although the interviews were largely
structured around some “core themes”, they were also conducted with an open-ended approach
by design.6 Currently, the archive contains interviews with 88 CEOs, which were conducted
between 2008 and 2017. The length of each interview ranged between one and two hours, and
each interview was transcribed and approved by each CEO prior to public distribution through
the archive website. While some of these interviews were held on the Harvard Business School
4 These interviews can be found at: http://www.hbs.edu/creating-emerging-markets/interviews/Pages/default.aspx . Accessed on July 31, 2017. 5 A more complete description can be found at: http://www.hbs.edu/creating-emerging-markets/about/Pages/default.aspx. Accessed on July 31, 2017. This project is physically located in Baker Library at the Harvard Business School. 6 http://www.hbs.edu/creating-emerging-markets/research/Pages/default.aspx. Accessed on July 31, 2017.
15
campus, most were conducted across the world, typically in the city in which the firm or
organization that the CEO represented was based. The project intentionally sought out CEOs as
interviewees who were over the age of 60 years to reduce informant bias, given that older
informants could be more frank as their words no longer affected their career prospects (Gao, et
al 2017). Examples of the firms that the interviewed CEOs represented included the Tata Group,
Claro y Cía and the United Bank for Africa. 34 organizations were interviewed from South
America, 10 from Central America, 26 from Asia, and 18 from Africa.7
Topical and emotional variation in CEO oral histories
In analyzing the transcripts of the CEO interviews in the “Creating Emerging Markets”
archive, we aim to illustrate how the language used by executives might reflect their deliberate
and endogenous choices about the particular set of topics to discuss or emotions to express, two
elements that, we argue, might be affected by shifts in the broader economic environment in
which our CEOs’ firms are situated. Qualitative analysis of interview transcripts and other forms
of natural language data typically relies on the subjective interpretations of individual researchers
to piece together a narrative or description as a way of informing a particular theoretical
explanation. For example, using the same set of interviews that we analyze in our study, Gao, et
al (2017) identify key passages to support their theorization of how leaders in emerging markets
utilize the reputation of their firms to overcome the institutional voids that characterize emerging
markets (Khanna and Palepu, 2010).
7 As Gao et al. (2017) document, the interviews began with open-ended questions asking the informant to describe his or her firm’s development from inception to the present. The firms were chosen using a theoretical sampling of the most successful firms in each country. As Gao et al. (2017) document, while this sampling technique led to selection of the most successful firms in each emerging market, such “theoretical sampling” also lends well to theory development (Eisenhardt and Graebner 2007; Pratt, 2009; Eisenhardt, Graebner and Sonenshein, 2016).
16
We argue that even in discussing objective historical events related to their companies,
speakers vary in the extent to which they, for example, concentrate on how those events relate to
either their firms’ activities or their own personal careers. In addition, speakers also express
different emotions in their choice of language in describing the past. As such, we emphasize
that understanding how oral histories are expressed is just as important as what oral histories
describe about the past.
For example, in almost every transcript, CEOs describe the challenges they faced in
growing their businesses, often attributing such barriers to government regulations or market
conditions. However, in characterizing these barriers, speakers used language that ranged from
conveying fear to expressing anticipation. Prithvi Raj Singh Oberoi – the CEO of EIH Limited,
a luxury hotel chain in India – recounted the history of government regulations and corruption
regarding travel and tourism in India as a barrier to the growth of the country’s hotel industry.
However, after recalling several key events that made it difficult for hotel operators to survive in
India, Oberoi also conveyed a sense of both anticipation and fear:
Well, I think enough has been said about corruption and red tape in India. I won’t dwell on that too much because everybody knows what has happened in the past, a lot has been written and in the media, television, and the newspapers and magazines every year we see—every month in fact we see some scam or other…. Things are better now…I think big businesses understand, all businessmen are understanding now, and in the long run [this corruption] doesn’t pay. But is there realization amongst the government [officials]? I think there is now. And Rome was not built in a day; we have been through a lot of problems in this country in the last 60 years.8
On the one hand, Oberoi signals optimism for the future of the growth of the hotel
industry in India, noting that issues regarding the “corruption” and “red tape” in government
regulations have improved over the past several decades and that even business leaders have
8 Interview with Prithvi Raj Singh Oberoi, Executive Chairperson, EIH Limited. Conducted by Ryan Buell and Ananth Raman on August 15, 2015. Video can be found at http://www.hbs.edu/creating-emerging-markets/interviews/Pages/profile.aspx?profile=prsoberoi (last accessed: September 27, 2017).
17
adapted. However, Oberoi also conveys hesitation about the future in considering the amount of
time that it has taken for adherence to regulations to improve. In other words, although there
certainly exists a subjective emotional tone in Oberoi’s words, identifying the precise mood of
these words would represent a subjective judgment that might depend on the disposition and
biases of the reader.
The ambiguity of the emotional valence of Oberoi’s reflection can also be found in the
remarks of Jim Damalas – CEO of Greentique Hotels, an upscale chain of hotels in Costa Rica –
who similarly spent a large proportion of his interview describing the challenges to growth in the
hotel industry in Costa Rica. Specifically, Damalas outlines the unique features of Costa Rica as
a tourist destination, recalling the history of events involving the government’s effort to protect
national parks and to preserve other indigenous ecosystems. Importantly, the success of his hotel
chain was tied to articulating a value proposition that hewed closely to the principles of eco-
tourism. In this context, however, Damalas expressed both trepidation and hope about the future
of competition in the hotel industry in Costa Rica, due to the emergence of international chains,
which he felt were not as keen to adapt to the local market:
Four Seasons as a corporation has a bigger budget than Costa Rica ever will have—to market a country. They have [more than that] to market a chain…So, if Costa Rica doesn’t understand how to grasp that and control it, because there should be a playing field for all types of market niches and for all types of businesses, what’ll happen is the same thing that Walmart’s doing and all the other big boys: they’re taking all us guys out. And when you do that, in a country based on individuality, pride, self-esteem—when you took out the military, in a country that really hasn’t had invasions or slavery for so long, and you bring in too many people from the outside that aren’t in love with the country, and that may not be in love with the people, you’ll change the vibe, and we’re seeing that in some of the big boutique properties, from what I hear.9
9 Interview with Jim Damalas, Founder and CEO, Greentique Hotels. Conducted by Andrew Spadafora on June 4, 204. Video can be found at http://www.hbs.edu/creating-emerging-markets/interviews/Pages/profile.aspx?profile=jdamalas (last accessed: September 27, 2017).
18
In both interviews, Damalas and Oberoi recall with vivid detail the specific conversations
and events that color the stories of their companies’ growth trajectories. The parts of their
interviews that contain their reflections and pontifications, however, are just as important in
establishing their subjective perspectives about the events that they identify as key moments in
their firms’ histories. Thus, although oral histories serve as valuable sources of facts, which
might be used to triangulate, verify, or deepen the interpretation of an existing narrative, they
also contain a layer of information about a CEO’s overall attitude and viewpoint. Importantly,
however, there is subjectivity in how readers might interpret the emotional tone of a CEO’s
spoken words, which presents a barrier to reliably measuring the emotional content of an
interview transcript.
Thus, although past work has established the importance of capturing a CEO’s state of
mind and their patterns of attention as an important factor in understanding how a firm’s
leadership relates to its strategic decision-making, the nuances of natural language in an
interview transcript that might reveal a CEO’s emotional disposition, for instance, are not
uniformly detectable by human readers. For example, in the interview excerpts above, different
readers might have different interpretations of the CEOs’ emotional valence on account of the
ambiguity of their language in these passages. However, it is clear that through their own spoken
words, CEOs do indeed convey their feelings and personal sentiments even when recounting
events that are intended to form an objective historical reality. The empirical challenge for
researchers is to systematically measure the type and extent of the emotions or other dimensions
of topical attention without relying on the subjective interpretations of human readers. Our aim
is to introduce a replicable, quantitative approach for operationalizing a CEO’s attention to
certain topics and emotional state when articulating details about past events.
19
Methodology
Overview
We argue that oral histories are rich data sources that are underutilized, in part, because
they are often long, unwieldy, and difficult to parse and summarize in a systematic way (Shaw
2015). Interview transcripts are additionally complicated by a distinct “turn-taking” structure.
Here, turn-taking refers to the pattern in which an interviewer’s question is followed by the
interviewee’s response, which is then followed by an interviewer’s question, and so on.
Topic models – a class of statistical models that “discover” the fixed set of underlying
topics in a collection of documents – provide an opportunity to reveal the subjects discussed in
an oral history transcript. Colloquially, a “topic” refers to a theme or a subject matter; in topic
modeling, these subject matters are represented by an ordered probability weighting of words. As
a simple example, a topic model for a set of documents about local government services might
produce a topic with its highest probability weights on the terms “car”, “train”, “drive”, and
“fly”, leading the researcher to infer that the topic refers to the subject of transportation. These
models provide an unstructured way of describing what subjects occur in a set of texts,
determined by which terms are most likely to co-occur within documents.10 Within this group of
models, Latent Dirichlet Allocation (LDA), the model we employ in this analysis, is one of the
most simple and broadly used models (Blei, Ng et al. 2003).
In the subsequent sections, we describe in detail the underpinnings of the LDA model and
the procedures we use to employ it, but as a broad overview, Figure 1 provides a visual roadmap
of the approach. The topic modeling process (steps A1-A7) involves several cleaning and
10 Computational linguists call this approach to topic modeling “unsupervised”, which refers to the notion that humans do not associate any words in the documents being analyzed with a pre-specified label for a topic prior to estimating the topic model.
20
preprocessing procedures, in which the transcripts are cleaned (see previous section), certain
superfluous text and characters are removed, and the transcripts are split into segments (A1-A3).
Following these steps, the LDA sampling algorithm is performed (A4) and the resulting topics
are examined (A5). These two steps may be repeated until a final model is selected. At this stage,
each topic that is output by the model is represented by a set of words, in which each word is
assigned a probability of belonging to that topic. Finally, we manually assign a label to each of
the topics produced by the model (A6) and examine the relationships between the proportions of
the documents estimated to belong to each topic, and our independent variables of interest (A7).
The process for the sentiment measures is much simpler and separate from the topic
model: after cleaning step as above (A1), the sentiment scores are calculated using a lexicon-
based approach. Specifically, within each transcript, we measure the prevalence of words that
correspond to particular terms that have been labeled as representing certain categories of
sentiments and emotions, according to widely-used dictionaries – a process we describe in
greater detail below. This produces measures of the extent to which a given transcript segment
represents emotions such as ‘fear’ or ‘anticipation’ (B1). We then examine the prevalence of
certain emotions in a given transcript segment in relationship to our independent variables (B2).
--------------------------------------------------------------------------------- INSERT FIGURE 1 ABOUT HERE
---------------------------------------------------------------------------------
Latent Dirichlet Allocation
The LDA model treats each document as a bag of words, meaning that the word order is
not taken into account, and assumes an underlying random generative process in the creation of
the “corpus” – or the set of documents being analyzed. It assumes that the collection of
documents was generated by an imaginary probabilistic process, word by word, by first sampling
21
a topic from a given document’s distribution of topics and then sampling a word from that
topic’s word distribution. The sampling algorithm takes in the cleaned documents and then
works backward, returning the most probable set of topics to have produced the given set of
documents, if they had indeed been created in this imaginary way. A researcher can then infer
the meaningful subjects represented by these topics, as in the transportation example given
above, and calculate the proportions of each document estimated to belong to each topic.
Mathematically, the model assumes each document consists of a random mixture of a
finite set of topics, and each topic represents a probabilistic distribution over the terms in the
“vocabulary”. A “vocabulary” refers to all of the terms used at least once across the entire
collection of documents. The model is essentially a Bayesian variant of Latent Semantic
Analysis, in which the topic distribution is given a Dirichlet prior (Griffiths and Steyvers 2004).
Specifically, the Dirichlet distribution is a probability distribution that samples over a discrete set
of categorical events, and is often used as a prior in Bayesian mixture models.
--------------------------------------------------------------------------------- INSERT FIGURE 2 ABOUT HERE
--------------------------------------------------------------------------------- The resulting probabilistic generative process – the hypothetical way in which the documents are
assumed to have been created – is graphically represented by the plate diagram in Figure 2. The
larger plate indicates that the step is repeated for each of M documents, while the smaller plate
indicates that within a document, the step is repeated for each of the N words. To generate each
word in each document, the process consists of selecting a topic z over the document’s mixture
of topics, and then a word w from that topic’s vocabulary weightings.11 To determine the most
likely set of topics to have generated the collection of documents, we fit the model on the corpus
11 α and β are the hyperparameters for the Dirichlet priors on the topic distribution per document and the term distribution per topic, respectively. θm parameterizes the categorical distribution of the document’s topic mixture, while the topic’s vocabulary weightings have a categorical distribution with parameter Φz.
22
by employing a Gibbs sampling algorithm – a commonly-used method of iteratively sampling
until convergence is reached – as the optimal solution cannot be solved for directly.12 We run the
sampling using the topicmodels package in R (Hornik et al. 2011).
Cleaning and Preprocessing
A number of preprocessing steps are necessary to ensure that a LDA model results in
coherent topics. In particular, for our oral history transcripts, we only used text that was spoken
by the interviewee so that we do not simultaneously model the thoughts and opinions of the
interviewer. Also, because 38 of the CEOs were interviewed in a language other than English
(specifically, either Spanish, Portuguese, or Turkish), we utilized the English translations of the
interview transcripts as our input data. We acknowledge that this might stand as a limitation of
our approach, as our model might being accounting for a translator’s own interpretations of a
CEO’s words rather than the CEO’s full expression in her native tongue. Our ultimate regression
analysis attempts to account for this potentially confounding factor by controlling for CEO’s
national origin, but we recognize that this limitation would not exist if all of the interviews were
conducted in the same language.
In addition, typically mandatory in this document cleaning process is the conversion of
all text to lowercase and removal of punctuation and numeric characters. Another common step
is to remove all “stop words” – that is, common words such as “and” or “the” that give no
relevant information about the topic probability. Finally, stemming of words to their root form –
12 Because the underlying estimation problem is intractable, a number of approximation methods are typically used in estimating the LDA model, most commonly expectation-maximization algorithms and Markov chain Monte Carlo (MCMC) sampling methods (Yao, Mimno et al. 2009). In this analysis, we employ one of the MCMC methods, collapsed Gibbs sampling. This is a permutation of the standard Gibbs sampling algorithm, a process of iteratively sampling the conditional probabilities of a joint distribution. By collapsing out (i.e. integrating over) the Dirichlet prior distribution, the algorithm encourages faster convergence (van Dyk and Park 2008).
23
an algorithmically-assisted process by which “run”, “runner”, and “running” would all be
reduced to the stem “run” (Lovins 1968) – is often helpful in achieving coherent topics.
More crucially to the context of oral histories and interviews, the length of each
document can have a powerful influence on the interpretability of an LDA model. For example, a
very long document may contain so many subjects that it is difficult for the algorithm to identify
a coherent set of topics, since the document is treated as a single bag of words. A frequent step
with longer documents is to break down the document into smaller, semantically coherent
segments (commonly 500 or 1000 words), a process for which a number of algorithms exist
(Riedl and Biemann 2012). However, the turn-taking design of an oral interview, which we
described earlier, provides a natural structure by which to segment each document. By removing
the interviewer’s questions and treating each response as its own segment, model performance
improves significantly. The model then treats each segment of a transcript (i.e., each response to
an interviewer’s question) as its own stand-alone document.
Output and Interpretation
Choosing the optimal number of topics for a topic model to produce over a set of
documents is often characterized as more of an art than a science. Measures of a model’s fit to
the corpus, such as perplexity and log likelihood, can provide some guidance. In this analysis, we
calculate the harmonic mean of the log-likelihood at various numbers of topics to pinpoint a
rough maximum (Griffiths and Steyvers 2004). It is worth noting that these measures do not
always line up exactly with human judgments of semantic coherence, and human judgment
remains the most popular way of selecting a final model (Chang, Boyd-Graber et al. 2009).
Coherence is typically best determined by examining the top most likely terms for each topic: a
24
good model should allow an observer to intuitively assign a title to each of the topics with a
quick glance at the most probable terms.
Once a final model has been chosen, the estimated topics can provide a number of
directions for interpretation. The proportion of words estimated to have come from each topic
may be used as a measure of topic prevalence per document. As our corpus structure consists of
long documents split into segments, we collapse each topic proportion back to the original
document – i.e., interview transcript – by weighting each segment by its length. This process
allows for comparison with interview-level covariates of interest. Finally, inter-topic
relationships – frequent co-occurrence or clusters of topics – may provide additional insight into
the oral histories.
Sentiment Analysis
Separate from the topic model, sentiment analysis is a valuable way to get a sense of the
emotional valence of a document. These methods are usually dictionary-based. The sentiment
measures in this paper are calculated using the syuzhet R package (Jockers 2015), which employs
crowd-sourced lexicons developed by Saif Mohammad at the National Resource Council of
Canada (2013). These lexicons correspond to eight primary emotions: anticipation, fear, joy,
sadness, trust, disgust, and anger. For each emotion, the terms in the lexicon have a binary value
for association. We sum the terms associated with each of the eight emotions at the sentence
level, and then calculate the proportion of each document dedicated to each emotion, so that the
values sum to one.
25
Independent Variables: Event Study Methodology and Days since Peak Market Return
One of our major explanatory variables was the cumulative average abnormal returns in
the period prior to the interview, a method frequently used to study the financial impact of
external events, commonly known as event study methodology (Bromiley and Marcus 1989,
McNichols and Dravid 1990, Hendricks and Singhal 1996). In our sample of firms that
correspond to our set of CEO interviews, this variable was only calculated for the 48 public firms
for which we could find the necessary financial data. Return data was gathered from Bloomberg
using the public common equity of the target firm where available and the public common equity
of the parent when the firm equity was not available. Returns at the index level were then
gathered from Bloomberg for each of the corresponding stock indices related to the equity data.
All returns were reported in USD and taken between January 1st, 2005 and August 28th, 2017.
The interview date was assigned as the event date. Using the methodology outlined in
Fama et al. (1969), residual analysis was carried out using an estimation window of -170 days to
-20 days prior to the event date. The abnormal return value was calculated by subtracting the
predicted value from the model from the actual daily return value. Abnormal return values were
kept over the event window, which ran from -19 days before the event to the event date.
The other major explanatory variable of interest was the number of days since the date of
maximum returns for the index of the country of each interviewee. For each of the firms with
return data, the maximum value of returns was determined for the entire period of available data
between January 1st, 2005 and the event date, as well as 12 months and 6 months prior to the
event date. We then calculated the number of days between the date of peak value and the
interview date.
26
Results
Topic Model and Descriptive Observation
--------------------------------------------------------------------------------- INSERT FIGURE 3 ABOUT HERE
---------------------------------------------------------------------------------
Figure 3 displays the top seven most likely terms for each topic for the final model used
in the analysis. Because the terms are stemmed, plural and verb endings are removed – for
example, the words “industries”, “industry”, and “industrialize” would be represented by the
stemmed version, “industr.” We have given labels to each topic according to our subjective
judgments of the primary subject of each topic. Many topics are industry-specific; for example,
Topic 1 appears to be concerned with manufacturing (“industr”, “plant”, and “technology” are
the top three stemmed words for Topic 1), Topic 16 centers on energy (“oper”, “power”,
“distribute”, “oil”), and Topic 30 appears to be about textiles and fashion (“women”, “design”,
“sari”, “visit”). Other topics appear to concern more general work-related subjects, such as
Topic 2, which seems to be about corporate social responsibility, Topic 3, which seems to be
concerned with hiring and human resources, and Topic 26, which has terms related to
management boards. Finally, a number of topics are more personal and appear to be concerned
with family (Topic 21), emotions (Topics 10 and 31), or life-related challenges (Topic 20).
Regressing indicators for region on the topic proportions can provide us with a sense of
which topics are most uniquely relevant for the three major regions in our sample: Asia, Africa
and Latin America. Figures 4, 5, and 6 display the coefficients from these regressions, sorted by
magnitude. These results serve primarily as a confirmation of the methodology. As expected,
certain industry topics and subjects are more prevalent for certain regions; for example, the
textile- and tea-related topics are associated with Asia, the transcripts from African executives
27
are more likely to discuss economic development and government, and the topics related to
tourism and mining are most predictive of Latin America. This coincides with our knowledge of
this particular sample of interviews, in which selection was geared toward well-known, iconic
examples of executives in a given country.
--------------------------------------------------------------------------------- INSERT FIGURE 4 ABOUT HERE
--------------------------------------------------------------------------------- ---------------------------------------------------------------------------------
INSERT FIGURE 5 ABOUT HERE --------------------------------------------------------------------------------- ---------------------------------------------------------------------------------
INSERT FIGURE 6 ABOUT HERE ---------------------------------------------------------------------------------
Cumulative Abnormal Returns
The event study methodology provides us with an opportunity to examine the cumulative
abnormal returns (CAR) just prior to the interview date. While the actual event that may have
caused any abnormal stock returns is unobserved to us, we assume that the interviewee is well
aware of the aberration and that it may affect their attention and sentiments, as well as the topics
discussed.
--------------------------------------------------------------------------------- INSERT TABLE 2 ABOUT HERE
--------------------------------------------------------------------------------- Table 2 displays correlations between our measure of abnormal returns and the
proportions of each transcript associated with the eight primary emotions from the NRC
sentiment lexicon: anticipation, fear, joy, sadness, trust, surprise, disgust and anger. The NRC
measures are useful in that they allow for finer distinctions than simply positive and negative
sentiment. While we observe in our results that that the abnormal returns have positive
relationships with all of the positive emotion categories and negative relationships with the
28
negative emotion categories, there are some notable differences. Notably, the largest correlations
are with anticipation (p=0.06, n=47), fear (p=0.09, n=47), and surprise (p=0.03, n=47) – all
sentiments that are more likely to be associated with unexpected events rather than matters of
course. Figure 7 displays plots of these relationships.
--------------------------------------------------------------------------------- INSERT FIGURE 7 ABOUT HERE
---------------------------------------------------------------------------------
Beyond the NRC sentiment measures, we examined the correlations between the
abnormal returns and several groupings of topics. Table 3 displays these correlations. The
“work” and “life” categories are two discrete categories that divide the topics into subjects
related to work versus subjects related to the interviewee’s life, and the “emotions” category is a
subset of the life-related topics that are specifically related to emotion (Topic 10: Emotion and
Gratitude, Topic 20: Challenges, Topic 21: Family, Topic 28: Growth, and Topic 31: Emotion).
There is a positive relationship between the abnormal returns measure and the proportion of the
interview that is devoted to talking about work.
--------------------------------------------------------------------------------- INSERT TABLE 3 ABOUT HERE
---------------------------------------------------------------------------------
Finally, we examine the effects of the abnormal stock returns on the work and life topics
groupings using ordinary least squares (OLS) regressions at the segment level. Table 4 displays
the results of these models, adding in fixed effects for years (Models 2 and 5) and indicators for
region (Models 3 and 6). Model 3, the full model predicting work-related topics, for example,
has the following specification:
Workij = β0 + β1CARj + β2Asiaj + β3Africaj + Yearj + εij
29
Workij represents the proportion of segment i in document j estimated to belong to work-related
topics and β1 is the coefficient estimate for the effect of CARj, the cumulative abnormal returns.
β2 and β3 represent the coefficient estimates for the effects of the Asia and Africa regions,
respectively, while Yearj represents fixed effects for each year in our sample.
A one standard deviation increase in abnormal returns is associated with approximately
0.05 standard deviation increase in discussing work-related categories, and a corresponding
decrease in life-related categories. As a robustness check, we employed a hierarchical linear
modeling (HLM) approach to account for the possibility of autocorrelation between document
segments. The results of these models are discussed at the end of this section.
--------------------------------------------------------------------------------- INSERT TABLE 4 ABOUT HERE
---------------------------------------------------------------------------------
Days Since Peak Return
One of the shortcomings of the event study methodology is that only slightly more than half of
the interviews in this sample are associated with a publicly listed company. In order to make
better use of the private firms in our sample, we turn to the market index for the country
associated with each CEO interviewee’s firm, calculating the days since the date of peak market
return at the time of the interview. Higher values for this measure indicate more prolonged
periods of negative market performance. Once again, this is a measure that we expect may affect
both the attention and sentiment of the executives being interviewed.
--------------------------------------------------------------------------------- INSERT TABLE 5 ABOUT HERE
---------------------------------------------------------------------------------
30
Table 5 displays the correlations between the time since peak market returns and the
NRC sentiment categories. Most notably, there is a strong negative association between the days
since peak market returns and trust (p=0.002, n=82). Figure 8 displays this relationship
graphically. It is worth noting that while there is not a straightforward positive-negative
sentiment relationship, the association with trust is particularly strong. One interpretation of this
result is that sustained weak market performance may reduce trust that the future is likely to
improve on the part of the CEO interviewee.
--------------------------------------------------------------------------------- INSERT FIGURE 8 ABOUT HERE
---------------------------------------------------------------------------------
Finally, we examined the correlations between the time since peak returns and the topic
categories discussed above (work, life, and emotions). Here, we note a positive correlation between
the days since peak returns and the proportion of the document associated with emotions (p=0.02,
n=82). This dovetails with some of the relationships we observe above with the NRC sentiments,
in which the time since peak returns is associated not only with negative sentiments such as sadness
and disgust, but also positive sentiments such as joy. Table 6 displays these correlations, and
Figure 9 graphically displays the relationship between the days since peak returns and the emotion-
related topic categories.
--------------------------------------------------------------------------------- INSERT TABLE 6 ABOUT HERE
---------------------------------------------------------------------------------
--------------------------------------------------------------------------------- INSERT FIGURE 9 ABOUT HERE
---------------------------------------------------------------------------------
31
Table 7 shows the results of an OLS model regressing emotion-related topic proportions
on the days since peak index returns, once again with the document segment as the smallest unit.
The full model (Model 3) follows the specification:
Emotionsij = β0 + β1DaysSincePeakj + β2Asiaj + β3Africaj + Yearj + εij
--------------------------------------------------------------------------------- INSERT TABLE 7 ABOUT HERE
--------------------------------------------------------------------------------- This specification is identical to the OLS specification for the cumulative average returns model,
except that Emotionsij represents the prevalence of emotion-related topics in segment i of
document j, and the main effect estimated by β1 is that of the days since peak index returns for
document j.
According to Table 7, a one standard deviation increase in the time since peak returns
value appears to be associated with approximately 0.02 of a standard deviation increase in
discussion of emotions-related topics, once region effects are included (Model 3).
For robustness to the possibility of the autocorrelation of variables within documents, we
examined several other model specifications using a Hierarchical Linear Modeling approach. Point
estimates for both explanatory variables of interest were largely consistent with the OLS estimates
when employing random intercepts models, as was a random slopes model estimation of the effect
of cumulative abnormal returns on the prevalence of work, life, and emotions topics (Raudenbush
and Bryk 2002). A random slopes specification with the days since peak returns variable did not
converge. Estimated standard errors varied based on the specification: while the random slopes
model estimate of the coefficient of abnormal returns and the random intercepts model estimate of
the coefficient of days since peak returns were robust to the addition of year fixed effects, both
estimated effects shrank when region indicators were added (p > .05, two-tailed test). These results
are available upon request.
32
Discussion
In this paper, we develop a new methodology using topic modeling and sentiment analysis with
application to oral history data, which we argue is an underutilized resource in strategy
scholarship. Our core contribution illustrates this novel method that could render oral history
data more accessible for strategy researchers. For the purposes of illustration alone, we employ
our novel methodology to study how environmental factors affect CEO communication. We find
that a one standard deviation increase in abnormal returns on the day of the CEO interview is
associated with approximately half a standard deviation increase in discussing work-related
content, and a corresponding decrease in life-related categories. In addition, increases in
cumulative abnormal returns are correlated with the CEOs expressing more surprise and less fear
in their language. We also find that a one standard deviation increase in the time since peak
returns value appears to be associated with approximately a tenth of a standard deviation increase
in the discussion of emotions-related topics in CEO interviews. Finally, more time since peak
returns is also correlated with a CEO’s tendency to use trust-related terms in her interview.
Our results contribute to several literatures, notably the literature arguing in favor of
historical analysis in strategy research, the literature on qualitative analysis in strategy research,
the research on upper echelons, managerial attention, managerial cognition and cognitive frames.
Our results also contribute to the emerging literature on how CEOs spend their time.
The exposition of our novel methodology to utilize oral history data adds to the relatively
thin literature on the use of historical data in strategy research. In particular, Jones and Khanna
(2006) outline two dimensions of historical data that makes it difficult for use in broad strategy
research – such data is often “qualitative” and often “small sample”. The authors then suggest
33
methods that strategy scholars could use to analyze historical data and list methods related to
Boolean algebra (Ragin, 1987), string analyses (Abbott, 2001) and computational models
(O’Rourke and Williamson, 1999). Oral history data often shares the qualitative and small
sample properties outlined by Jones and Khanna (2006) and our novel methodology provides
strategy scholars yet another empirical tool to use to further historical analysis in strategy
research. In effect, we show how even with a small sample of interviews (n = 88), our approach
through segmenting each interview transcript allows for a meaningful quantitative analysis
through topic modeling. Because topic models tend to generate unstable and meaningless output
when the input documents are long, the text of a typical oral history transcript is generally not
well suited for topic modeling. Indeed, each of the interviews in our dataset last between one
and two hours. However, by taking the additional step of segmenting each transcript based on
the its turn-taking structure, we demonstrate how to pre-process oral history transcript texts for
appropriate use with natural language processing techniques like topic modeling.
Our methodology also provides firms and strategy scholars empirical means to conduct
“temporal search”, i.e. search for knowledge created at different points in the past. Such
knowledge might be “frozen in time” in oral history or other historical records. Temporal search
of historical data might help firms create competitive advantage through subsequent acts of
innovation and organizational renewal. Analyses of such data using our methodology might help
strategy scholars answer questions related to how temporal search and analyzing the past can be
used by firms to create competitive advantage in the future.
An important contribution of our study is the exposition of a replicable methodology for
using qualitative data such as oral history. Our analysis is based on replicable algorithms and the
use of publicly available interview transcripts (all pre-processed interview transcripts are
34
available with authors upon request). The use of transcripts that can be shared, and the use of
replicable topic modeling tools makes it possible to reproduce our analysis. This is unlike other
qualitative studies where the full interview transcripts, field notes and coder inputs are usually
not available to other scholars.
More broadly, our results (though not the core contribution of our paper) contribute to the
literature on managerial attention. In this literature, Ocasio (1997, 2011) builds on Simon (1947)
to outline the premise of “situated attention”, which posited that what answers decision makers
focus on depends on the particular context or situation they find themselves in. These situated
answers in turn manifest in “procedural and communication channels” such as action
memoranda, quarterly and annual reports, etc. However, while action memoranda and
quarterly/annual reports run the risk of being written by employees of the corporate
communications team and additionally run the risk of being sanitized prior to publication, oral
history data and CEO interviews, provide a relatively unfiltered peak into the situated attention
of the CEO and arguably represents an underutilized communication channel that should be
studied by scholars in the literature of managerial attention.
Our methodology could also be more broadly employed in research in strategy on upper
echelons, managerial attention, managerial cognition and cognitive frames. One of the core
propositions of the literature on upper echelons (Hambrick and Mason, 1984; Hambrick 1994) is
that managers act on the basis of their personalized interpretations of the strategic situations they
face; however the literature has not outlined any precise methodology to measure of managers’
cognitive frames. To quote Hambrick (2007), “demographic characteristics of executives can be
used as valid, albeit incomplete and imprecise proxies of executives’ cognitive frames”
(Hambrick 2007, page 335; italics added by authors). Arguably, topic modeling of oral history or
35
other comparable interview data could help provide a complementary toolkit to code cognitive
frames of managers.
In more recent literature on cognitive frames, Kaplan (2008) defines frames as “means by
which managers make sense of ambiguous information from their environments.” In this
literature, Kaplan (2008) uses CEO letters to shareholders and content analysis to measure
managerial cognition. However, the author also alludes to other sources of data that could be
used to measure managerial cognition, including data obtained through CEO interviews, akin to
the oral history data we use. For Kaplan (2008), “other kinds of statements by CEOs, such as
those obtained through interviews or surveys, might initially appear to be attractive (data)
sources, but they are impractical for larger samples of firms over long periods” (Kaplan 2008,
page 679). One of the reason oral history data has been “impractical” to use in strategy research
so far has been the absence of a robust methodology to use such data. The methodology outlined
in our paper is a step in that direction.
Our methodology could also be employed in the strategy research related to
interpretation. In this stream of research, Barr (1998) traces managers’ interpretations over time
as they grapple with environmental events, and the author uses CEO letters to shareholders as
well as copies of CEO speeches from the Wall Street Transcripts to conduct causal reasoning
analysis (Axelrod 1976; Huff et al., 1990) and Ward’s (1962) method of cluster analysis.
Arguably, the analysis of oral history using the methodology of topic modeling and sentiment
analysis will provide a complementary source of data and a complementary analytical tool kit to
researchers grappling with such questions.
We also contribute to the literature on how CEOs spend their time. In a recent study in
this literature, Bandiera et al. (2017) outline that one of the most important activities on which
36
CEOs spend their time is communication, both inside and outside the organization. Our results
indicate that CEO communication is related to market and firm performance. It is also plausible
that market and firm performance is also correlated to how CEOs choose to allocate their time.
Our study has several limitations. First, because our data are limited to interviews with
CEOs of firms in emerging markets, we cannot generalize our results about our CEOs’ emotion
and topical attention relate to the economic environment in which their firms are situated to
CEOs of firms in developed or under-developed economies. In other words, it is possible that in
a developed economy, a CEO’s emotions might not be as sensitive to greater cumulative
abnormal returns as they would be for CEOs in emerging markets. We encourage researchers to
adopt our methods to future projects that might examine such a comparison. In addition, in
terms of data limitations, as Kaplan (2008) states, the study of oral interview data suffers from
the risk of retrospective bias as managers would likely adapt their memories of their views in
prior years to subsequent outcomes. We partially circumvent this issue by employing our data
and methodology to study how market outcomes affect memories (i.e. how abnormal returns on
day of CEO interview affect memory and CEO communication), rather than studying how
memories of events are related to outcomes. Our methodology is also limited by the fact that the
machine learning process only uses text and is unable to use video or audio material. In
analyzing CEO emotions, it is plausible that coders using video/audio material are better able to
“visualize” emotions such as disgust in the facial expressions and/or voice intonations of the
CEO.
As for other technical limitations, we also can only account for differences in the region-
of-origin for our CEO interviewees and the firms they represent. However, as a feature of the
interview data collection, the CEOs’ regions are also associated with whether or not the
37
interviews themselves were conducted in English. For instance, most CEOs from South
American countries were interviewed in their native Spanish, which meant that our analysis
could only incorporate the English translations of their interview transcripts. Future research
might look into the sensitivity of topic model results to translation effects. Finally, although our
approach utilized unsupervised LDA to estimate topic models, it is possible that a supervised
approach could produce more meaningful topic estimates (Ramage, et al 2009). A supervised
approach would require researchers to read through a sample of transcripts and to associate
certain words with pre-determined topics, giving the topic model a fixed prior for structuring the
relationship between estimated topics. A supervised approach is encouraged when the language
used in a corpus of documents has excessive jargon, such that relevant experts would be able to
identify which specific and salient words should cohere together as a topic. The language in our
interviews do not arguably reflect the excessive use of jargon, but it is possible that other oral
business histories exhibit higher proportions of industry-specific terminology.
In conclusion, we document a novel and replicable methodology for using qualitative
data such as oral history in strategy research. Our methodology is based on using easily available
oral history transcripts and a replicable method based on topic modeling and sentiment analysis.
We also develop a proof of concept of using our methodology and provide evidence suggestive
that CEO communication is correlated with firm and market performance. This result is relevant
for scholarship on how environmental factors affect managerial attention, managerial cognition
and the allocation of CEO time on communication. Most importantly, our methodology opens
the door for strategy scholars to use easily available, yet under-utilized oral history archives
around the world.
38
Selected References Bandiera, O; L Guiso, A Prat, R Sadun, “What do CEOs do?”, Review of Financial Studies, 2017, Forthcoming
Bandiera, O., Lemos, R., Prat, A. and Sadun, R., 2013. Managing the family firm: evidence from CEOs at work (No. w19722). National Bureau of Economic Research.
Barber, Brad M., and John D. Lyon. "Detecting long-run abnormal stock returns: The empirical power and specification of test statistics." Journal of financial economics 43.3 (1997): 341-372. Barley, S. R. 1990 "Images of imaging: Notes on doing longitudinal field work." Organization Science. 1:220-247.
Blei, D. M., et al. (2003). "Latent Dirichlet Allocation." Journal of Machine Learning Research 3. Bromiley, P. and A. Marcus, "The Deterrent to Dubious Corporate Behavior: Profitability, Probability and Safety Recalls," Strategic Management J., 10 (1989), 233-250. Calori, R., Johnson, G. and Sarnin, P., 1994. CEOs' cognitive maps and the scope of the organization. Strategic Management Journal, 15(6), pp.437-457. Chang, J., et al. (2009). "Reading Tea Leaves: How Humans Interpret Topic Models." Neural Information Processing Systems. Chatterjee, A. and Hambrick, D.C., 2011. Executive personality, capability cues, and risk taking: How narcissistic CEOs react to their successes and stumbles. Administrative Science Quarterly, 56(2), pp.202-237.
D'Aveni, R.A. and MacMillan, I.C., 1990. Crisis and the content of managerial communications: A study of the focus of attention of top managers in surviving and failing firms. Administrative science quarterly, pp.634-657. Daft, R.L., Sormunen, J. and Parks, D., 1988. Chief executive scanning, environmental characteristics, and company performance: An empirical study. Strategic management journal, 9(2), pp.123-139. Delgado‐García, J.B., La Fuente‐Sabaté, D. and Manuel, J., 2010. How do CEO emotions matter? Impact of CEO affective traits on strategic and performance conformity in the Spanish banking industry. Strategic Management Journal, 31(5), pp.562-574.
39
Duncan, R.B., 1972. Characteristics of organizational environments and perceived environmental uncertainty. Administrative science quarterly, pp.313-327. Dunning, J.H., 1998. American investment in British manufacturing industry. Taylor & Francis US. Fama, Eugene F., et al. "The adjustment of stock prices to new information." International Economic Review 10.1 (1969): 1-21. Fama, E.F. and French, K.R., 1993. Common risk factors in the returns on stocks and bonds. Journal of financial economics, 33(1), pp.3-56. Gamache, D.L., McNamara, G., Mannor, M.J. and Johnson, R.E., 2015. Motivated to acquire? The impact of CEO regulatory focus on firm acquisitions. Academy of Management Journal, 58(4), pp.1261-1282.
Gao, C., Zuzul, T., Jones, G. and Khanna, T., 2017. Overcoming Institutional Voids: A Reputation‐Based View of Long‐Run Survival. Strategic Management Journal. Griffiths, T. L. and M. Steyvers (2004). "Finding scientific topics." PNAS 101. Hambrick, D.C. and Mason, P.A., 1984. Upper echelons: The organization as a reflection of its top managers. Academy of management review, 9(2), pp.193-206. Hambrick, D.C. and Macmillan, I.C., 1985. Efficiency of product R&D in business units: The role of strategic context. Academy of Management Journal, 28(3), pp.527-547. Helfat, C.E. and Peteraf, M.A., 2015. Managerial cognitive capabilities and the microfoundations of dynamic capabilities. Strategic Management Journal, 36(6), pp.831-850. Hendricks, K. B. and V. R. Singhal, "Quality Awards and the Market Value of the Firm: An Empirical Investigation," Management Sci., 42 (1996), 415-436. Herrmann, P. and Nadkarni, S., 2014. Managing strategic change: The duality of CEO personality. Strategic Management Journal, 35(9), pp.1318-1342.
Hill, R.C. and Levenhagen, M., 1995. Metaphors and mental models: Sensemaking and sensegiving in innovative and entrepreneurial activities. Journal of Management, 21(6), pp.1057-1074. Hiller, N.J. and Hambrick, D.C., 2005. Conceptualizing executive hubris: the role of (hyper‐)
core self‐evaluations in strategic decision‐making. Strategic Management Journal, 26(4), pp.297-319.
40
Hornik, Kurt, and Bettina Grün. "topicmodels: An R package for fitting topic models." Journal of Statistical Software 40.13 (2011): 1-30. Huang, A. Lehavy, R., Zang, A., and Zheng, R. 2017. “Analyst Information Discovery and Interpretation Roles: A Topic Modeling Approach.” Management Science, Forthcoming.
Jacobsen, R., 1988. The persistence of abnormal returns. Strategic management journal, 9(5), pp.415-430. Jones, G. and Khanna, T., 2006. Bringing history (back) into international business. Journal of International Business Studies, 37(4), pp.453-468. Jones, G., 2005. Multinationals and global capitalism: From the nineteenth to the twenty first century. Oxford University Press on Demand. Khanna, Tarun and Krishna G Palepu (with Richard Bullock), Winning in emerging markets, Harvard business press, 2010.
Kogut, B.M. ed., 1993. Country competitiveness: Technology and the organizing of work. Oxford University Press on Demand. Larcker, D.F. and Zakolyukina, A.A., 2012. Detecting deceptive discussions in conference calls. Journal of Accounting Research, 50(2), pp.495-540.
Lefebvre, L.A., Mason, R. and Lefebvre, E., 1997. The influence prism in SMEs: The power of CEOs' perceptions on technology policy and its organizational impacts. Management Science, 43(6), pp.856-878. Lehavy R, Li F, Merkley K (2011) The effect of annual report readability on analyst following and the properties of their earnings forecasts. Accounting Rev. 86(3):1087–1115
Lovins, J. B. (1968). "Development of a Stemming Algorithm." Mechanical Translation and Computational Linguistics 11. Loughran T, McDonald B (2016) Textual analysis in accounting and finance: A survey. J. Accounting Res. 54(4):1187–1230
McNichols, M. and A. Dravid, "Stock Dividends, Stock Splits, and Signaling," J. Finance, 45 (1990), 857-879. Mintzberg, H., 1987. Crafting strategy (pp. 66-75). Boston, MA, USA: Harvard Business School Press.
41
O'Sullivan, M., 2001. Contests for corporate control: Corporate governance and economic performance in the United States and Germany. OUP Catalogue. Portelli, A., 2009. What makes oral history different. Oral history, oral culture, and Italian Americans, pp.21-30. Ramage, D., Hall, D., Nallapati, R. and Manning, C.D., 2009. “Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora.” Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. (pp. 248-256). Association for Computational Linguistics.
Raudenbush, S.W. and A.S. Bryk. 2002. Hierarchical linear models: Applications and data analysis methods (Vol. 1). Thousand Oaks, CA: Sage. Riedl, M. and C. Biemann (2012). "Text segmentation with topic models." Journal for Language Technology and Computational Linguistics 27(1). Salancik, G.R. and Meindl, J.R., 1984. Corporate attributions as strategic illusions of management control. Administrative science quarterly, pp.238-254. Saretzky, G.D., 1981. Oral History in American Business Archives. The American Archivist, 44(4), pp.353-355. Shaw, R. (2015). "Automatically Segmenting Oral History Transcripts." arXiv preprint arXiv(1509.08842). Tasker, S.C., 1998. Bridging the information gap: Quarterly conference calls as a medium for voluntary disclosure. Review of Accounting Studies, 3(1), pp.137-167.
Thompson, P., 2017. The voice of the past: Oral history. Oxford university press. van Dyk, D. A. and T. Park (2008). "Partially Collapsed Gibbs Samplers." Journal of the American Statistical Association 103(482): 790-796. Vernon, R., 1966. International investment and international trade in the product cycle. The quarterly journal of economics, pp.190-207. Watzlawick, Paul, and J. H. Beavin. "B., & Jackson, DD (1967)." Pragmatics of human communication (1967). Wilkins, M., 1970. The emergence of multinational enterprise: American business abroad from the colonial era to 1914 (Vol. 34). Cambridge, Mass: Harvard University Press. Wilkins, M., 1974. Multinational Oil Companies in South America in the 1920s: Argentina, Bolivia, Brazil, Chile, Colombia, Ecuador, and Peru. Business History Review, 48(3), pp.414-446.
42
Yadav, M.S., Prabhu, J.C. and Chandy, R.K., 2007. Managing the future: CEO attention and innovation outcomes. Journal of Marketing, 71(4), pp.84-101. Yao, L., et al. (2009). "Efficient methods for topic model inference on streaming document collections." Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM.
43
Figures and Tables
Figure 1: Overview of Methodological Process
Figure 2 : Plate Diagram for Latent Dirichlet Allocation (LDA)
44
Figure 3: Top Terms for Topic Model
45
Figure 4: Topics Most Predictive of Asia Region
46
Figure 5: Topics Most Predictive of Africa Region
47
Figure 6: Topics Most Predictive of Latin America Region
48
Figure 7: Relationships Between CAR and Selected Sentiment Measures
49
Figure 8: Relationship Between Days Since Peak Returns and Trust Sentiment
50
Figure 9: Relationship Between Days Since Peak Returns and Emotion-Related Topics
51
Table 1: Oral History Archives
ve Name Source Notable and/or Relevant Projects d University Creating ing Markets Initiative
http://www.hbs.edu/creating-emerging-markets/interviews/Pages/default.aspx
Center for Oral History http://oralhistory.library.ucla.edu/Browse.do?coreDescCvPk=27901&Subject=Business Entrepreneurs of the West bia University Oral History es
http://library.columbia.edu/locations/ccoh.html Various interviews with executives and entrepreneurs
Bank Oral history archive http://oralhistory.worldbank.org/
a University Center for the of History and Memory
http://www.indiana.edu/~cshm/alphalist.html Indiana Automobile Industry, Generations Auto Workers
sity of California Berkeley istory Collection
http://www.lib.berkeley.edu/libraries/bancroft-library/oral-history-center/search-oral-histories
Venture Capital, Amgen, Biotechnology, Business archives
sity of Connecticut Oral y
http://www.oralhistory.uconn.edu/catalog.html Connecticut Workers and a Half Century of Technological Change, 1930-1980
sity of Kentucky Louie B. Center for Oral History
https://kentuckyoralhistory.org/ Kentucky Entrepreneurial History Collectio
itish Library https://www.bl.uk/collection-guides/oral-histories-of-business-and-finance An Oral History of the Electricity Supply Industry, Prudential Interviews
story Factory http://www.historyfactory.com/ The History Factory helps large firms chron their own histories through interviews
y Associates https://www.historyassociates.com/who-we-serve/our-clients/ Same as The History Factory sity of Florida Oral History tions
http://ufdc.ufl.edu/ohfbl Florida Business Leaders Oral History Coll
52
Table 2: Correlations Between Abnormal Returns and NRC Sentiments
Table 3: Relationship Between Abnormal Returns and Topic Categories
53
Table 4: OLS Regressions of Cumulative Abnormal Returns on Topic Categories
Table 5: Relationship Between Days Since Peak Returns and NRC Sentiments
54
Table 6: Relationship Between Days Since Peak Return and Topic Categories
Table 7: OLS Regressions of Days Since Peak Index Returns on Topic Categories