Modeling Oral Business History Data: An Application to ... Files/18-064_cd75f777-3230-4b41... · Modeling Oral Business History Data: An Application to Markets and CEO Communication

Modeling Oral Business History Data: An Application to Markets and CEO Communication

Prithwiraj (Raj) Choudhury Natalie A. Carlson

Dan Wang Tarun Khanna

Working Paper 18-064

Working Paper 18-064

Copyright © 2018 by Prithwiraj (Raj) Choudhury, Dan Wang, Natalie A. Carlson, and Tarun Khanna

Working papers are in draft form. This working paper is distributed for purposes of comment and discussion only. It may not be reproduced without permission of the copyright holder. Copies of working papers are available from the author.

Modeling Oral Business History Data: An Application to Markets and CEO Communication

Prithwiraj (Raj) Choudhury Harvard Business School

Natalie A. Carlson Columbia University

Dan Wang Columbia University

Tarun Khanna Harvard Business School

1

Modeling oral business history data:

An application to markets and CEO communication

Prithwiraj (Raj) Choudhury, Dan Wang, Natalie A. Carlson and Tarun Khanna

We shed light on oral business history data, widely available and yet underutilized, as a relevant and useful resource for strategy and management scholars. Specifically, we outline a novel methodology based on topic modeling and sentiment analysis to illustrate that such data can be used to generate insight about the relationship between a firm’s past financial performance and the language and tone of speeches by the firm’s CEO. Using 88 CEO interviews conducted between 2008 and 2017, archived at the publicly accessible Harvard Business School online repository titled “Creating Emerging Markets”, we employ our unique methodology to study how environmental factors such as unexpected market returns are related to the range of topics and sentiments expressed by CEOs’ spoken words. In doing so, we match the data from the CEO interviews with financial performance data for the set of firms led by those CEOs as well as market performance data from the countries in which these firms are headquartered. Our results suggest that greater abnormal returns just prior to the date of the CEO interview are positively correlated with certain emotions expressed by CEOs, such as surprise, fear, and anticipation, an indication that unanticipated firm performance heightens the emotional states of firm leadership, with possible consequences for subsequent strategy choices. Furthermore, greater abnormal returns also are positively correlated with a CEO’s tendency to focus their speech more on work-related topics rather than personal topics.

[242 words]

2

Introduction

There is a rich tradition of strategy and international business (IB) scholars of engaging with

historical data (Dunning 1998, Vernon 1966, Kogut 1993, O’Sullivan 2001, Jones 2005).

Scholars such as Raymond Vernon, John Stopford and Geoffrey Jones were deeply influenced by

the work of business historians such as Mira Wilkins (Wilkins 1970, 1974). However, the

systematic investigation of historical evidence has since disappeared from the more recent

research agendas of strategy and IB scholars. According to Jones and Khanna (2006), historical

data can provide a meaningful complement to other cross-sectional analyses in studying many

conceptual issues that can shape the development of new theory in strategy research. In this

paper, we develop their argument and illuminate a novel methodology involving oral history

data, topic modeling, and sentiment analysis. In particular, we argue that oral history represents

a rich data source for strategy research that has been underexploited because of the absence of a

robust methodology to utilize such data. We develop and apply our approach to study a key

strategy question - how do contextual economic factors affect the topical and emotional content

of CEO communication?

Oral history data has been extensively used by historians, and business historians in

particular (Portelli 2009; Thomson 2017). However, to the best of our knowledge, the use of oral

history data has not permeated into the world of strategy scholarship. This is despite the

availability of rich oral business history archives from multiple public databases. For example,

even as early as the 1970s and 1980s, extensive collections of oral history data concerning

American business activity have been preserved and made available, including the oral history

project initiated by Ford Motor Company in collaboration with Columbia University and similar

projects at Sears, Roebuck and Company, Atlantic Richfield, and other large corporations

3

(Saretzky 1981).1 Although oral history data has been instrumental as a way of illuminating the

events of the past in historical narratives, one of the stumbling blocks in using oral history data

for strategy scholarship has been the absence of a robust methodology outlining how oral history

data might be used to inform the analysis of strategy research questions.

In this paper, we employ unique oral history data from the Harvard Business School

curated archive of interviews with CEO titled “Creating Emerging Markets”. The archive

consists of a unique collection of oral history transcripts of interviews with the CEOs of 88

unique firms, recorded by researchers at the Harvard Business School between years 2008 and

2017. Part of this data was used in a recent paper that employed inductive-deductive theory

building methods to study how firms, competing in emerging markets, are able to survive over

the long term (Gao et al., 2017). Whereas Gao, et al (2017) develop insights through traditional

qualitative analysis, we use utilize the same data to demonstrate a new and complementary

quantitative approach to qualitative data. Specifically, we outline and employ a novel

methodology based on state-of-the-art topic modeling to generate dependent variables related to

the dispersion of topics and sentiment of the content discussed by CEOs in these interviews.

Our choice of topic modeling follows recent strategy scholarship that has increasingly

used the technique to take advantage of the availability of text-based data about firm activity

(Kaplan and Vakili 2015). In short, topic modeling offers a systematic way of quantitatively

measuring the prevalence or distribution of some group of topics that describe the distribution of

content of a set of documents in the form of sets of keywords. We apply topic modeling to oral

history interview transcripts associated with a set of business leaders in emerging economies.

1 As the author notes, even as far back as in 1979, Theresa McHugh Palmersheim, graduate student at the American University, found that only one-fifth of the 500 largest American firms had archives and out of the fifty-five firms responding to her questionnaire, twenty indicated the inclusion of oral histories.

4

Because the textual structure of oral history interview transcripts is unique, one of our main

contributions is to describe a replicable approach for preparing such data for analysis. We argue

that with ours or similar approaches, strategy scholarship can benefit from exploiting rich

databases that contain interview-based oral business history to explore how firm level variables

might affect an executive’s interpretation of her leadership role and influence in a firm.

To demonstrate our approach with a relevant empirical question, we employ our novel

methodology and data to shed light on a question that strategy scholars have long studied: how

are contextual economic factors and environmental uncertainty related to the topical and

emotional content of CEO communication? Past work has theorized about a link between the

content and tone of a CEO’s communication and the immediate economic and environmental

factors that have shaped the performance of a CEO’s firm (Lefebvre et al., 1997; D’Aveni and

MacMillan, 1990). More broadly, our study also relates to other work that has focused on factors

that influence the cognitive frames of top managers (Tripsas and Gavetti, 2000; Hambrick 2007;

Kaplan, 2008), as well as the wider literature on situated attention and the environment, within

which managers make decisions (Barnard 1938; Ocasio, 1997).

Given that the focus of our paper is methodological – that is, we introduce a novel

method to incorporate oral history data in strategy research – we do not claim to provide causal

empirical results around our research question. Instead, we attempt to correlate the dependent

variables related to CEO communication that reveal the emotional state of our CEOs and their

attention to work or personal topics that we create through topic models of oral history data, with

independent variables constructed through an event study analysis based on firm level abnormal

returns to stock price (Jacobsen 1988; Fama and French 1993; Barber and Lyon 1997).

5

We argue that understanding the factors that shape a CEO’s attention and emotional state

– which can be revealed in an analysis of their language in their own spoken words – is an

important complement to existing work that links a CEO’s emotions and subjective state of mind

to their imminent strategic choices. Namely, Delgado-Garcia and De La Fuente-Sebate (2010)

argue that CEOs who display positive long-term affective traits, such as optimism or

anticipation, tend to associated with strategic choices that conform less to industry standards.

Related studies have also investigated the relationship between certain aspects of a CEO’s

personality, such as narcissism, extraversion, or openness, on subsequent strategic decision-

making (Chatterji, et al 2010, Herrman and Hadkarni 2013, Gamache, et al 2015, Hiller and

Hambrick 2005).

Our results show a remarkably robust relationship between the emotions and type of

content discussed by a CEO and the broader economic factors that have recent shaped the

performance of the CEO’s firm. Specifically, larger cumulative abnormal returns for a CEO’s

firm are associated with greater anticipation, surprise, and fear in a CEO’s language in the

interview transcripts we analyze. In other words, unexpected shocks to firm performance can

position CEOs in an elevated emotional state. In addition, greater cumulative abnormal returns

also predict that CEOs concentrate more on work-related topics in their interviews, indicating

that unexpected positive patterns in their firms’ recent performance can influence CEO attention.

Importantly, the transcript data we use come from open-ended, semi-structured interviews; thus,

the topics that the CEOs in our dataset decide to address in the course of an interview are

entirely their own decisions. In summary, our findings illustrate a vital, but under-examined,

pathway through which unanticipated changes in a firm performance might affect its future

strategy: namely, exogenous environmental factors that affect a firm’s performance in

6

unexpected ways arguably produces shifts in the firm’s strategy because they can provoke

changes in the emotional state and attention of the firm’s CEO.

Our analytical approach to making sense of oral histories based on transcripts of CEO

interviews has several methodological advantages over more conventional means of interpreting

and summarizing written communication data. Because we adopt a systematic quantitative

approach to measuring the emotional and topical content of a CEO’s oral communication, our

findings are highly replicable in other datasets. Moreover, our results are not contingent on the

unobserved subjective biases of human coding, which has conventionally been used to prepare

qualitative data for analysis. Similarly, because our method relies on machine learning, our

approach to analyzing the intricacies of text-based data reduces the often high cost and effort

imposed by human coding. Thus, we lower the barrier to entry for future researchers who might

wish to adapt our techniques for exploiting similar data.

Our methodology and results are relevant for several literatures in strategy, notably the

nascent literature that argues for the incorporation of historical data into strategy research (Jones

and Khanna, 2006; Gao, et al 2017). Our methodology and results are also relevant to the

literature that focuses on CEO attention (Daft et al., 1988; Calori et al., 1994; Lefebvre et al.,

1997; Yadav et al., 2007), the broader literature on managerial attention (Ocasio, 1997, 2011),

managerial interpretation and managerial cognition (Barr 1998; Tripsas and Gavetti, 2000;

Kaplan, 2008). Given that CEOs spend a large share of their time communicating within and

outside the organization, we also add to the literature on how CEOs spend their time (Porter and

Nohria, 2010; Bandiera et al., 2017). Finally, by highlighting the influence of contextual factors

on the quality and content of qualitative data, we also make a contribution to the long-standing

7

tradition focused on using qualitative data in strategy research (Hatch and Schultz 2017, Corbin

and Strauss 2008, Eisenhardt 1991, Barley 1990).

Theory: CEO Communication and Contextual Factors

The Chief Executive Officer (CEO) occupies arguably the most central and important leadership

role at any firm, being principally charged with the obligation to set firm strategy (Hambrick and

Mason, 1984). One of the most important ways that a CEO might influence firm strategy is by

communicating her ideas to internal and external stakeholders (D’Aveni and MacMillan, 1990;

Lefebvre et al., 1997; Yadav et al., 2007).

Starting with Mintzberg (1987), firm strategy has been characterized as abstractions in

the minds of managers. Calori et al. (1994) characterized the CEO as a “cognizer”, an individual

who integrates views in the top management team and communicates the integrated view to

internal and external stakeholders. Building on this work, the subsequent strategy literature has

studied the effect of CEO communication on firm level outcomes. In a relatively recent study,

Yadav et al. (2007) coded CEO communication using letters to shareholders that were featured

in firms’ annual reports.2 Using these data, the authors showed that certain features of CEO

communication – specifically having greater internal and external focus – can have a “positive

and long term impact on how firms detect, develop and deploy new technologies over time”

(Yadav, et al 2007: 84). Other more recent work has utilized similar data, such as CEOs’ written

diaries, to investigate how executives divide their attention among various firm and personal

activities (Bandiera, et al 2013).

2 The authors used data from the letters from the CEO to shareholders to code several explanatory variables such as “future focus” (coded using the frequency of the word ‘will’ in the letters); “external focus” (coded using the frequency of words in the letters, denoting outward attention to customers and competitors) and “internal focus” (coded using the frequency of words in the letters denoting the inward attention to organization specific issues).

8

A related literature in strategy and organizations studies how contextual and

environmental factors relate to CEO communication. In the strategy literature, Hambrick and

Macmillan (1985) stated that “context refers to the environment and broad organizational milieu

in which the innovative attempt is situated” (Hambrick and MacMillan, 1985: 529). Hambrick

and MacMillan (1985) take cues from Duncan (1972), who suggested that the “environment”

consists of the relevant physical and social factors outside the boundary of an organization that

are taken into consideration during organizational decision making (Daft et al., 1988: 124).

Specifically, Daft et al. (1988) argue that contextual uncertainty increases information processing

within firms and, faced with this, the CEO must identify and interpret problems and

opportunities and accordingly communicate strategic adaptations to changing environmental

conditions.

Lefebvre et al. (1997) extend this line of reasoning one step further to argue that it is the

CEO’s perceptions of the environment, not the objective realities of the environment that shapes

CEO communication and firm strategy. Here, they introduce the term “prism effect” to describe

how the objective realities of the external environment are shaped by the personal biases of the

CEO. This “prism effect”, in turn, influences CEO communication and acts as a moderator for

how a firm’s technology policy might affect its realized innovative efforts. In the same literature,

D’Aveni and MacMillan (1990) compared senior managers’ letters to shareholders during

demand-decline crises for 57 bankrupt firms and 57 matched survivors. The authors found that

under environmental uncertainty, not only do the CEOs of surviving firms pay disproportionate

attention to the output environment of the firm, but their communication to shareholders also

more strongly reflect these structural differences in their attention.

9

In many of the studies we have reviewed on CEO communication and CEO attention, the

workhorse methodological tool has been content analysis of written communication by the CEO.

More broadly, the use of content analysis of written communication, such as CEO letters to

stakeholders, has a long tradition of usage in the strategy research (Watzlawick et al., 1967;

Salancik and Meindl, 1984). Although written communication has several strengths that lend

well to content analysis, a glaring omission is the absence of analysis using oral communication

records by the CEO.3 Pfeffer (1981) argued that the content analysis of language helps provide

evidence of the origination of problems within the organization and organizational responses to

such problems. Arguably both written and oral communication should reflect the perceived

realities of CEOs, as articulated by Lefebvre et al. (1997). In addition, in a recent paper, Helfat

and Peteraf (2015) outline several characteristics of oral communication by CEOs and the effect

such communication might have on individual workers and firm strategy:

The communication style of top managers in general, and the way in which they communicate a vision for the organization in particular, can inspire workers, encourage initiative, and drive entrepreneurial growth (Baum, Locke, and Kirkpatrick, 1998; Wesley and Mintzberg, 1989). Managerial skill in using language, such as through impromptu talks, flow of words, and articulation in conversation, may affect worker response to change initiatives (Helfat and Peteraf, 2015; page 843). Despite the realization of the value of data embedded in CEOs’ oral communication, to

the best of our knowledge, there are no studies in the strategy literature that use the oral

communication records of CEOs as a main data source to better understand the relationship

between CEO communication, attention, and firm outcomes. We suggest that this rich data

source has been underexploited by strategy scholars in the absence of a robust methodology to

3 However, some work in accounting has examined firms’ tendency to engage shareholders through quarterly conference calls, often drawing upon the transcripts of some of those calls, which have been analyzed using, among other methods, topic modeling as well. (Tasker 1998, Lehavy, et al 2011, Larcker and Zakolyukina 2012, Huang, et al Forthcoming). For a more comprehensive review in the accounting literature, please see Loughran and McDonald (2016).

10

utilize such data. To address this oversight, in the following sections we develop a novel

methodology to utilize oral history data (oral interview data of CEOs) and use these data to study

the effect of contextual factors on CEO communication.

Unique Features of Oral Business History Transcripts

Our primary data come from oral histories in the form of interview transcripts with

business leaders. Oral histories occupy a unique space as far back as historical data sources go.

Most primary source data, such as book-keeping notes or census records, are sought by historians

because they form a legitimate basis for describing objective reality from the past. However, the

value of oral history is precisely the opposite. Because oral histories are collected by recording

an individual’s thoughts in a certain context about certain topics, even when reconstructing a

narrative of events, subjectivity is a salient and inherent feature of oral history data. Transcripts

of oral histories reflect a speaker’s thoughts and biases “in the moment” of their delivery

whereas written communication records – such as letters to shareholders – are deliberately

planned, revised, and are often influenced by the different stakeholders involved in drafting a

written message. According to Portelli (1981: 100-01), oral histories reveal the dispositional

states of speakers at a given point in time, often in relation to “not just what people did, but what

they wanted to do, what they believed they were doing, what they now think they did.” In other

words, an oral history offers a glimpse into the psychological stakes of a particular memory that

a speaker recalls.

Aside from the value of subjectivity, oral histories from business leaders, in particular,

are a particularly useful resource for organizational scholars because not only, they can reveal

insight into the events and activities leading up to important business decisions – which are often

11

undocumented – but a speaker’s interpretation and feelings about those events. Keulen and

Kroeze (2012) identify three unique advantages of using oral history data from organizational

leaders for piecing together and making sense of business history. Namely, they characterize

“oral business history” as 1) “archival”, 2) “scientific”, and 3) “democratic”. We describe each

of these features below, arguing that each of these features makes oral business history data

attractive as a resource for strategy scholarship.

Oral business history, itself, refers to oral histories about the activities of business

organizations (Perks 2010). The analysis of oral history data primarily comes in the form of the

content analysis of the transcripts associated with oral records. Because many institutional

research efforts to collect oral history data more broadly have been conducted with the aim of

giving voice to strata of society that are typically oppressed, oral business histories in particular

are relatively rare as a data source because few historians have sought to document the oral

statements of the elites that are seen as comprising the voices of business activities. As such,

Keulen and Kroeze (2012) argue that oral business histories are uniquely “archival” because they

are often the only way to garner information about why certain decisions were made, given that

many firm decisions often lack paper trails or other documentation.

In addition, oral business histories are also “scientific” as data sources because they are

often collected via a semi-structured or unstructured interview with the leaders of an

organization. Therefore, Keulen and Kroeze (2012) use the term “scientific” to describe the

validity of oral history data as an accurate rather than false representation of a speaker’s

impressions and thoughts, given that speakers are given free reign to express themselves. The

execution of a relatively open-ended interview necessitates that an interviewer conduct

background research on the interviewee subject, which gives the impression that both the

12

interviewer and interviewee should take the conversation seriously, thus influencing interviewees

to speak more candidly during a recording.

Finally, business oral histories are “democratic” because although business leaders often

appear in the public spotlight, given that they are beholden to multiple powerful stakeholders,

business leaders also often feel that they cannot speak freely (Abrams 2010: 161). As such, oral

histories that are drawn from interviews with executives have a tendency to liberate the personal

sentiments and thoughts of business leaders who might find it lonely at the top. In other words,

CEOs and other business leaders often see an “oral history interview as an opportunity to tell

their side of the story” (Keulen and Kroeze 2012).

Oral business histories are therefore ideally suited as a data source for exploring how

contextual factors affect the communication style and content of CEOs. Unlike official written

communications from CEOs, which convey the premeditated and largely edited views of a firm’s

leadership, oral business histories offer insight into the subjectivity of CEO communication and

language, which can arguably affect how CEOs make decisions. As a result of their candidness,

oral histories collected through interviews are arguably one of the most valuable sources of data

on a CEO’s sensitivities, emotions, tastes, and opinions, but one that is also vastly under-

exploited.

In addition, we emphasize the value of oral histories related to business decisions because

an enormous institutional effort has been undertaken to make such data freely available and

accessible. Table 1, for example, reports a list of selected archives, mostly housed in university

libraries that contain a diverse array of interviews with business leaders, covering a wide range

of industries, regions, and topics. One notable resource is Columbia University’s Oral History

Archive, which has been widely acknowledged as the largest searchable database of oral history

13

records in the world, giving access not only to audio and video records of interviews with

business executives, but also their accompanying transcripts as well. Given the rich tradition in

strategy research of forming explanations of firm behavior based on an understanding of the

dispositional or emotional state of firm leadership, we argue that it is imperative for strategy

scholars to exploit these archives to inform novel theory building.

--------------------------------------------------------------------------------- INSERT TABLE 1 ABOUT HERE

---------------------------------------------------------------------------------

Importantly, we make a distinction between the analysis of transcripts of oral business

histories and the analysis of the audio and visual content of oral business histories. In analyzing

transcripts of interviews with CEOs below, we focus on the explicit language used by

interviewees in sharing their thoughts, but we cannot incorporate measurements of physical

appearances or aural indicators, which also contain information about a CEO’s emotional state or

attention. Sociolinguists and computer scientists have developed means for analyzing features of

spoken speech, such as prosody and pitch, correlating them with the emotional variation and

meaning. Deploying similar techniques offers a promising pathway for future researchers to take

even greater advantage of oral business history records. However, to bring greater focus to our

analysis, we limit the scope of our methodological description and demonstration to transcripts.

Data: ‘Creating Emerging Markets’ Oral History Archives at Harvard Business School

Our oral history data come from the ‘Creating Emerging Markets’ archive, which

contains video recordings of interviews with 88 CEOs of multinational companies headquartered

in emerging economies. The interview transcripts, which we used as our primary input data,

were downloaded from the interviews section of the Harvard Business School’s “Creating

14

Emerging Markets” initiative.4 The objective of this initiative is to analyze the evolution of

business leadership in Africa, Asia and Latin America.5 As the curator of the oral history

archives describes in the website – “Creating Emerging Markets explores the evolution of

business leadership in Africa, Asia, and Latin America.

At its core are interviews, many on video and by Harvard Business School faculty

members, with leaders or former leaders of businesses and NGOs. These interviews, with men

and women of diverse backgrounds, address pivotal moments of transition in their organizations.

They contain compelling insights on entrepreneurship, innovation, family business, and the

globalization of firms and brands. Emphasizing ways that businesses can create value for their

societies, the project provides a unique resource for research and teaching. According to the

website for the archive (see footnote 5), it was envisioned as a public goods project, designed to

be available to scholars and educators worldwide.

The interviews themselves were conducted by Harvard Business School faculty with the

leaders or former leaders of businesses and non-governmental organizations and particularly

focus on significant moments of transition in their regions. Although the interviews were largely

structured around some “core themes”, they were also conducted with an open-ended approach

by design.6 Currently, the archive contains interviews with 88 CEOs, which were conducted

between 2008 and 2017. The length of each interview ranged between one and two hours, and

each interview was transcribed and approved by each CEO prior to public distribution through

the archive website. While some of these interviews were held on the Harvard Business School

4 These interviews can be found at: http://www.hbs.edu/creating-emerging-markets/interviews/Pages/default.aspx . Accessed on July 31, 2017. 5 A more complete description can be found at: http://www.hbs.edu/creating-emerging-markets/about/Pages/default.aspx. Accessed on July 31, 2017. This project is physically located in Baker Library at the Harvard Business School. 6 http://www.hbs.edu/creating-emerging-markets/research/Pages/default.aspx. Accessed on July 31, 2017.

http://www.hbs.edu/creating-emerging-markets/interviews/Pages/default.aspx

http://www.hbs.edu/creating-emerging-markets/about/Pages/default.aspx

http://www.hbs.edu/creating-emerging-markets/about/Pages/default.aspx

http://www.hbs.edu/creating-emerging-markets/research/Pages/default.aspx

15

campus, most were conducted across the world, typically in the city in which the firm or

organization that the CEO represented was based. The project intentionally sought out CEOs as

interviewees who were over the age of 60 years to reduce informant bias, given that older

informants could be more frank as their words no longer affected their career prospects (Gao, et

al 2017). Examples of the firms that the interviewed CEOs represented included the Tata Group,

Claro y Cía and the United Bank for Africa. 34 organizations were interviewed from South

America, 10 from Central America, 26 from Asia, and 18 from Africa.7

Topical and emotional variation in CEO oral histories

In analyzing the transcripts of the CEO interviews in the “Creating Emerging Markets”

archive, we aim to illustrate how the language used by executives might reflect their deliberate

and endogenous choices about the particular set of topics to discuss or emotions to express, two

elements that, we argue, might be affected by shifts in the broader economic environment in

which our CEOs’ firms are situated. Qualitative analysis of interview transcripts and other forms

of natural language data typically relies on the subjective interpretations of individual researchers

to piece together a narrative or description as a way of informing a particular theoretical

explanation. For example, using the same set of interviews that we analyze in our study, Gao, et

al (2017) identify key passages to support their theorization of how leaders in emerging markets

utilize the reputation of their firms to overcome the institutional voids that characterize emerging

markets (Khanna and Palepu, 2010).

7 As Gao et al. (2017) document, the interviews began with open-ended questions asking the informant to describe his or her firm’s development from inception to the present. The firms were chosen using a theoretical sampling of the most successful firms in each country. As Gao et al. (2017) document, while this sampling technique led to selection of the most successful firms in each emerging market, such “theoretical sampling” also lends well to theory development (Eisenhardt and Graebner 2007; Pratt, 2009; Eisenhardt, Graebner and Sonenshein, 2016).

16

We argue that even in discussing objective historical events related to their companies,

speakers vary in the extent to which they, for example, concentrate on how those events relate to

either their firms’ activities or their own personal careers. In addition, speakers also express

different emotions in their choice of language in describing the past. As such, we emphasize

that understanding how oral histories are expressed is just as important as what oral histories

describe about the past.

For example, in almost every transcript, CEOs describe the challenges they faced in

growing their businesses, often attributing such barriers to government regulations or market

conditions. However, in characterizing these barriers, speakers used language that ranged from

conveying fear to expressing anticipation. Prithvi Raj Singh Oberoi – the CEO of EIH Limited,

a luxury hotel chain in India – recounted the history of government regulations and corruption

regarding travel and tourism in India as a barrier to the growth of the country’s hotel industry.

However, after recalling several key events that made it difficult for hotel operators to survive in

India, Oberoi also conveyed a sense of both anticipation and fear:

Well, I think enough has been said about corruption and red tape in India. I won’t dwell on that too much because everybody knows what has happened in the past, a lot has been written and in the media, television, and the newspapers and magazines every year we see—every month in fact we see some scam or other…. Things are better now…I think big businesses understand, all businessmen are understanding now, and in the long run [this corruption] doesn’t pay. But is there realization amongst the government [officials]? I think there is now. And Rome was not built in a day; we have been through a lot of problems in this country in the last 60 years.8

On the one hand, Oberoi signals optimism for the future of the growth of the hotel

industry in India, noting that issues regarding the “corruption” and “red tape” in government

regulations have improved over the past several decades and that even business leaders have

8 Interview with Prithvi Raj Singh Oberoi, Executive Chairperson, EIH Limited. Conducted by Ryan Buell and Ananth Raman on August 15, 2015. Video can be found at http://www.hbs.edu/creating-emerging-markets/interviews/Pages/profile.aspx?profile=prsoberoi (last accessed: September 27, 2017).

http://www.hbs.edu/creating-emerging-markets/interviews/Pages/profile.aspx?profile=prsoberoi

http://www.hbs.edu/creating-emerging-markets/interviews/Pages/profile.aspx?profile=prsoberoi

17

adapted. However, Oberoi also conveys hesitation about the future in considering the amount of

time that it has taken for adherence to regulations to improve. In other words, although there

certainly exists a subjective emotional tone in Oberoi’s words, identifying the precise mood of

these words would represent a subjective judgment that might depend on the disposition and

biases of the reader.

The ambiguity of the emotional valence of Oberoi’s reflection can also be found in the

remarks of Jim Damalas – CEO of Greentique Hotels, an upscale chain of hotels in Costa Rica –

who similarly spent a large proportion of his interview describing the challenges to growth in the

hotel industry in Costa Rica. Specifically, Damalas outlines the unique features of Costa Rica as

a tourist destination, recalling the history of events involving the government’s effort to protect

national parks and to preserve other indigenous ecosystems. Importantly, the success of his hotel

chain was tied to articulating a value proposition that hewed closely to the principles of eco-

tourism. In this context, however, Damalas expressed both trepidation and hope about the future

of competition in the hotel industry in Costa Rica, due to the emergence of international chains,

which he felt were not as keen to adapt to the local market:

Four Seasons as a corporation has a bigger budget than Costa Rica ever will have—to market a country. They have [more than that] to market a chain…So, if Costa Rica doesn’t understand how to grasp that and control it, because there should be a playing field for all types of market niches and for all types of businesses, what’ll happen is the same thing that Walmart’s doing and all the other big boys: they’re taking all us guys out. And when you do that, in a country based on individuality, pride, self-esteem—when you took out the military, in a country that really hasn’t had invasions or slavery for so long, and you bring in too many people from the outside that aren’t in love with the country, and that may not be in love with the people, you’ll change the vibe, and we’re seeing that in some of the big boutique properties, from what I hear.9

9 Interview with Jim Damalas, Founder and CEO, Greentique Hotels. Conducted by Andrew Spadafora on June 4, 204. Video can be found at http://www.hbs.edu/creating-emerging-markets/interviews/Pages/profile.aspx?profile=jdamalas (last accessed: September 27, 2017).

18

In both interviews, Damalas and Oberoi recall with vivid detail the specific conversations

and events that color the stories of their companies’ growth trajectories. The parts of their

interviews that contain their reflections and pontifications, however, are just as important in

establishing their subjective perspectives about the events that they identify as key moments in

their firms’ histories. Thus, although oral histories serve as valuable sources of facts, which

might be used to triangulate, verify, or deepen the interpretation of an existing narrative, they

also contain a layer of information about a CEO’s overall attitude and viewpoint. Importantly,

however, there is subjectivity in how readers might interpret the emotional tone of a CEO’s

spoken words, which presents a barrier to reliably measuring the emotional content of an

interview transcript.

Thus, although past work has established the importance of capturing a CEO’s state of

mind and their patterns of attention as an important factor in understanding how a firm’s

leadership relates to its strategic decision-making, the nuances of natural language in an

interview transcript that might reveal a CEO’s emotional disposition, for instance, are not

uniformly detectable by human readers. For example, in the interview excerpts above, different

readers might have different interpretations of the CEOs’ emotional valence on account of the

ambiguity of their language in these passages. However, it is clear that through their own spoken

words, CEOs do indeed convey their feelings and personal sentiments even when recounting

events that are intended to form an objective historical reality. The empirical challenge for

researchers is to systematically measure the type and extent of the emotions or other dimensions

of topical attention without relying on the subjective interpretations of human readers. Our aim

is to introduce a replicable, quantitative approach for operationalizing a CEO’s attention to

certain topics and emotional state when articulating details about past events.

19

Methodology

Overview

We argue that oral histories are rich data sources that are underutilized, in part, because

they are often long, unwieldy, and difficult to parse and summarize in a systematic way (Shaw

2015). Interview transcripts are additionally complicated by a distinct “turn-taking” structure.

Here, turn-taking refers to the pattern in which an interviewer’s question is followed by the

interviewee’s response, which is then followed by an interviewer’s question, and so on.

Topic models – a class of statistical models that “discover” the fixed set of underlying

topics in a collection of documents – provide an opportunity to reveal the subjects discussed in

an oral history transcript. Colloquially, a “topic” refers to a theme or a subject matter; in topic

modeling, these subject matters are represented by an ordered probability weighting of words. As

a simple example, a topic model for a set of documents about local government services might

produce a topic with its highest probability weights on the terms “car”, “train”, “drive”, and

“fly”, leading the researcher to infer that the topic refers to the subject of transportation. These

models provide an unstructured way of describing what subjects occur in a set of texts,

determined by which terms are most likely to co-occur within documents.10 Within this group of

models, Latent Dirichlet Allocation (LDA), the model we employ in this analysis, is one of the

most simple and broadly used models (Blei, Ng et al. 2003).

In the subsequent sections, we describe in detail the underpinnings of the LDA model and

the procedures we use to employ it, but as a broad overview, Figure 1 provides a visual roadmap

of the approach. The topic modeling process (steps A1-A7) involves several cleaning and

10 Computational linguists call this approach to topic modeling “unsupervised”, which refers to the notion that humans do not associate any words in the documents being analyzed with a pre-specified label for a topic prior to estimating the topic model.

20

preprocessing procedures, in which the transcripts are cleaned (see previous section), certain

superfluous text and characters are removed, and the transcripts are split into segments (A1-A3).

Following these steps, the LDA sampling algorithm is performed (A4) and the resulting topics

are examined (A5). These two steps may be repeated until a final model is selected. At this stage,

each topic that is output by the model is represented by a set of words, in which each word is

assigned a probability of belonging to that topic. Finally, we manually assign a label to each of

the topics produced by the model (A6) and examine the relationships between the proportions of

the documents estimated to belong to each topic, and our independent variables of interest (A7).

The process for the sentiment measures is much simpler and separate from the topic

model: after cleaning step as above (A1), the sentiment scores are calculated using a lexicon-

based approach. Specifically, within each transcript, we measure the prevalence of words that

correspond to particular terms that have been labeled as representing certain categories of

sentiments and emotions, according to widely-used dictionaries – a process we describe in

greater detail below. This produces measures of the extent to which a given transcript segment

represents emotions such as ‘fear’ or ‘anticipation’ (B1). We then examine the prevalence of

certain emotions in a given transcript segment in relationship to our independent variables (B2).

--------------------------------------------------------------------------------- INSERT FIGURE 1 ABOUT HERE

---------------------------------------------------------------------------------

Latent Dirichlet Allocation

The LDA model treats each document as a bag of words, meaning that the word order is

not taken into account, and assumes an underlying random generative process in the creation of

the “corpus” – or the set of documents being analyzed. It assumes that the collection of

documents was generated by an imaginary probabilistic process, word by word, by first sampling

21

a topic from a given document’s distribution of topics and then sampling a word from that

topic’s word distribution. The sampling algorithm takes in the cleaned documents and then

works backward, returning the most probable set of topics to have produced the given set of

documents, if they had indeed been created in this imaginary way. A researcher can then infer

the meaningful subjects represented by these topics, as in the transportation example given

above, and calculate the proportions of each document estimated to belong to each topic.

Mathematically, the model assumes each document consists of a random mixture of a

finite set of topics, and each topic represents a probabilistic distribution over the terms in the

“vocabulary”. A “vocabulary” refers to all of the terms used at least once across the entire

collection of documents. The model is essentially a Bayesian variant of Latent Semantic

Analysis, in which the topic distribution is given a Dirichlet prior (Griffiths and Steyvers 2004).

Specifically, the Dirichlet distribution is a probability distribution that samples over a discrete set

of categorical events, and is often used as a prior in Bayesian mixture models.


--------------------------------------------------------------------------------- The resulting probabilistic generative process – the hypothetical way in which the documents are

assumed to have been created – is graphically represented by the plate diagram in Figure 2. The

larger plate indicates that the step is repeated for each of M documents, while the smaller plate

indicates that within a document, the step is repeated for each of the N words. To generate each

word in each document, the process consists of selecting a topic z over the document’s mixture

of topics, and then a word w from that topic’s vocabulary weightings.11 To determine the most

likely set of topics to have generated the collection of documents, we fit the model on the corpus

11 α and β are the hyperparameters for the Dirichlet priors on the topic distribution per document and the term distribution per topic, respectively. θm parameterizes the categorical distribution of the document’s topic mixture, while the topic’s vocabulary weightings have a categorical distribution with parameter Φz.

22

by employing a Gibbs sampling algorithm – a commonly-used method of iteratively sampling

until convergence is reached – as the optimal solution cannot be solved for directly.12 We run the

sampling using the topicmodels package in R (Hornik et al. 2011).

Cleaning and Preprocessing

A number of preprocessing steps are necessary to ensure that a LDA model results in

coherent topics. In particular, for our oral history transcripts, we only used text that was spoken

by the interviewee so that we do not simultaneously model the thoughts and opinions of the

interviewer. Also, because 38 of the CEOs were interviewed in a language other than English

(specifically, either Spanish, Portuguese, or Turkish), we utilized the English translations of the

interview transcripts as our input data. We acknowledge that this might stand as a limitation of

our approach, as our model might being accounting for a translator’s own interpretations of a

CEO’s words rather than the CEO’s full expression in her native tongue. Our ultimate regression

analysis attempts to account for this potentially confounding factor by controlling for CEO’s

national origin, but we recognize that this limitation would not exist if all of the interviews were

conducted in the same language.

In addition, typically mandatory in this document cleaning process is the conversion of

all text to lowercase and removal of punctuation and numeric characters. Another common step

is to remove all “stop words” – that is, common words such as “and” or “the” that give no

relevant information about the topic probability. Finally, stemming of words to their root form –

12 Because the underlying estimation problem is intractable, a number of approximation methods are typically used in estimating the LDA model, most commonly expectation-maximization algorithms and Markov chain Monte Carlo (MCMC) sampling methods (Yao, Mimno et al. 2009). In this analysis, we employ one of the MCMC methods, collapsed Gibbs sampling. This is a permutation of the standard Gibbs sampling algorithm, a process of iteratively sampling the conditional probabilities of a joint distribution. By collapsing out (i.e. integrating over) the Dirichlet prior distribution, the algorithm encourages faster convergence (van Dyk and Park 2008).

23

an algorithmically-assisted process by which “run”, “runner”, and “running” would all be

reduced to the stem “run” (Lovins 1968) – is often helpful in achieving coherent topics.

More crucially to the context of oral histories and interviews, the length of each

document can have a powerful influence on the interpretability of an LDA model. For example, a

very long document may contain so many subjects that it is difficult for the algorithm to identify

a coherent set of topics, since the document is treated as a single bag of words. A frequent step

with longer documents is to break down the document into smaller, semantically coherent

segments (commonly 500 or 1000 words), a process for which a number of algorithms exist

(Riedl and Biemann 2012). However, the turn-taking design of an oral interview, which we

described earlier, provides a natural structure by which to segment each document. By removing

the interviewer’s questions and treating each response as its own segment, model performance

improves significantly. The model then treats each segment of a transcript (i.e., each response to

an interviewer’s question) as its own stand-alone document.

Output and Interpretation

Choosing the optimal number of topics for a topic model to produce over a set of

documents is often characterized as more of an art than a science. Measures of a model’s fit to

the corpus, such as perplexity and log likelihood, can provide some guidance. In this analysis, we

calculate the harmonic mean of the log-likelihood at various numbers of topics to pinpoint a

rough maximum (Griffiths and Steyvers 2004). It is worth noting that these measures do not

always line up exactly with human judgments of semantic coherence, and human judgment

remains the most popular way of selecting a final model (Chang, Boyd-Graber et al. 2009).

Coherence is typically best determined by examining the top most likely terms for each topic: a

24

good model should allow an observer to intuitively assign a title to each of the topics with a

quick glance at the most probable terms.

Once a final model has been chosen, the estimated topics can provide a number of

directions for interpretation. The proportion of words estimated to have come from each topic

may be used as a measure of topic prevalence per document. As our corpus structure consists of

long documents split into segments, we collapse each topic proportion back to the original

document – i.e., interview transcript – by weighting each segment by its length. This process

allows for comparison with interview-level covariates of interest. Finally, inter-topic

relationships – frequent co-occurrence or clusters of topics – may provide additional insight into

the oral histories.

Sentiment Analysis

Separate from the topic model, sentiment analysis is a valuable way to get a sense of the

emotional valence of a document. These methods are usually dictionary-based. The sentiment

measures in this paper are calculated using the syuzhet R package (Jockers 2015), which employs

crowd-sourced lexicons developed by Saif Mohammad at the National Resource Council of

Canada (2013). These lexicons correspond to eight primary emotions: anticipation, fear, joy,

sadness, trust, disgust, and anger. For each emotion, the terms in the lexicon have a binary value

for association. We sum the terms associated with each of the eight emotions at the sentence

level, and then calculate the proportion of each document dedicated to each emotion, so that the

values sum to one.

25

Independent Variables: Event Study Methodology and Days since Peak Market Return

One of our major explanatory variables was the cumulative average abnormal returns in

the period prior to the interview, a method frequently used to study the financial impact of

external events, commonly known as event study methodology (Bromiley and Marcus 1989,

McNichols and Dravid 1990, Hendricks and Singhal 1996). In our sample of firms that

correspond to our set of CEO interviews, this variable was only calculated for the 48 public firms

for which we could find the necessary financial data. Return data was gathered from Bloomberg

using the public common equity of the target firm where available and the public common equity

of the parent when the firm equity was not available. Returns at the index level were then

gathered from Bloomberg for each of the corresponding stock indices related to the equity data.

All returns were reported in USD and taken between January 1st, 2005 and August 28th, 2017.

The interview date was assigned as the event date. Using the methodology outlined in

Fama et al. (1969), residual analysis was carried out using an estimation window of -170 days to

-20 days prior to the event date. The abnormal return value was calculated by subtracting the

predicted value from the model from the actual daily return value. Abnormal return values were

kept over the event window, which ran from -19 days before the event to the event date.

The other major explanatory variable of interest was the number of days since the date of

maximum returns for the index of the country of each interviewee. For each of the firms with

return data, the maximum value of returns was determined for the entire period of available data

between January 1st, 2005 and the event date, as well as 12 months and 6 months prior to the

event date. We then calculated the number of days between the date of peak value and the

interview date.

26

Results

Topic Model and Descriptive Observation


---------------------------------------------------------------------------------

Figure 3 displays the top seven most likely terms for each topic for the final model used

in the analysis. Because the terms are stemmed, plural and verb endings are removed – for

example, the words “industries”, “industry”, and “industrialize” would be represented by the

stemmed version, “industr.” We have given labels to each topic according to our subjective

judgments of the primary subject of each topic. Many topics are industry-specific; for example,

Topic 1 appears to be concerned with manufacturing (“industr”, “plant”, and “technology” are

the top three stemmed words for Topic 1), Topic 16 centers on energy (“oper”, “power”,

“distribute”, “oil”), and Topic 30 appears to be about textiles and fashion (“women”, “design”,

“sari”, “visit”). Other topics appear to concern more general work-related subjects, such as

Topic 2, which seems to be about corporate social responsibility, Topic 3, which seems to be

concerned with hiring and human resources, and Topic 26, which has terms related to

management boards. Finally, a number of topics are more personal and appear to be concerned

with family (Topic 21), emotions (Topics 10 and 31), or life-related challenges (Topic 20).

Regressing indicators for region on the topic proportions can provide us with a sense of

which topics are most uniquely relevant for the three major regions in our sample: Asia, Africa

and Latin America. Figures 4, 5, and 6 display the coefficients from these regressions, sorted by

magnitude. These results serve primarily as a confirmation of the methodology. As expected,

certain industry topics and subjects are more prevalent for certain regions; for example, the

textile- and tea-related topics are associated with Asia, the transcripts from African executives

27

are more likely to discuss economic development and government, and the topics related to

tourism and mining are most predictive of Latin America. This coincides with our knowledge of

this particular sample of interviews, in which selection was geared toward well-known, iconic

examples of executives in a given country.


--------------------------------------------------------------------------------- ---------------------------------------------------------------------------------

INSERT FIGURE 5 ABOUT HERE --------------------------------------------------------------------------------- ---------------------------------------------------------------------------------

INSERT FIGURE 6 ABOUT HERE ---------------------------------------------------------------------------------

Cumulative Abnormal Returns

The event study methodology provides us with an opportunity to examine the cumulative

abnormal returns (CAR) just prior to the interview date. While the actual event that may have

caused any abnormal stock returns is unobserved to us, we assume that the interviewee is well

aware of the aberration and that it may affect their attention and sentiments, as well as the topics

discussed.


--------------------------------------------------------------------------------- Table 2 displays correlations between our measure of abnormal returns and the

proportions of each transcript associated with the eight primary emotions from the NRC

sentiment lexicon: anticipation, fear, joy, sadness, trust, surprise, disgust and anger. The NRC

measures are useful in that they allow for finer distinctions than simply positive and negative

sentiment. While we observe in our results that that the abnormal returns have positive

relationships with all of the positive emotion categories and negative relationships with the

28

negative emotion categories, there are some notable differences. Notably, the largest correlations

are with anticipation (p=0.06, n=47), fear (p=0.09, n=47), and surprise (p=0.03, n=47) – all

sentiments that are more likely to be associated with unexpected events rather than matters of

course. Figure 7 displays plots of these relationships.


---------------------------------------------------------------------------------

Beyond the NRC sentiment measures, we examined the correlations between the

abnormal returns and several groupings of topics. Table 3 displays these correlations. The

“work” and “life” categories are two discrete categories that divide the topics into subjects

related to work versus subjects related to the interviewee’s life, and the “emotions” category is a

subset of the life-related topics that are specifically related to emotion (Topic 10: Emotion and

Gratitude, Topic 20: Challenges, Topic 21: Family, Topic 28: Growth, and Topic 31: Emotion).

There is a positive relationship between the abnormal returns measure and the proportion of the

interview that is devoted to talking about work.


---------------------------------------------------------------------------------

Finally, we examine the effects of the abnormal stock returns on the work and life topics

groupings using ordinary least squares (OLS) regressions at the segment level. Table 4 displays

the results of these models, adding in fixed effects for years (Models 2 and 5) and indicators for

region (Models 3 and 6). Model 3, the full model predicting work-related topics, for example,

has the following specification:

Workij = β0 + β1CARj + β2Asiaj + β3Africaj + Yearj + εij

29

Workij represents the proportion of segment i in document j estimated to belong to work-related

topics and β1 is the coefficient estimate for the effect of CARj, the cumulative abnormal returns.

β2 and β3 represent the coefficient estimates for the effects of the Asia and Africa regions,

respectively, while Yearj represents fixed effects for each year in our sample.

A one standard deviation increase in abnormal returns is associated with approximately

0.05 standard deviation increase in discussing work-related categories, and a corresponding

decrease in life-related categories. As a robustness check, we employed a hierarchical linear

modeling (HLM) approach to account for the possibility of autocorrelation between document

segments. The results of these models are discussed at the end of this section.


---------------------------------------------------------------------------------

Days Since Peak Return

One of the shortcomings of the event study methodology is that only slightly more than half of

the interviews in this sample are associated with a publicly listed company. In order to make

better use of the private firms in our sample, we turn to the market index for the country

associated with each CEO interviewee’s firm, calculating the days since the date of peak market

return at the time of the interview. Higher values for this measure indicate more prolonged

periods of negative market performance. Once again, this is a measure that we expect may affect

both the attention and sentiment of the executives being interviewed.


---------------------------------------------------------------------------------

30

Table 5 displays the correlations between the time since peak market returns and the

NRC sentiment categories. Most notably, there is a strong negative association between the days

since peak market returns and trust (p=0.002, n=82). Figure 8 displays this relationship

graphically. It is worth noting that while there is not a straightforward positive-negative

sentiment relationship, the association with trust is particularly strong. One interpretation of this

result is that sustained weak market performance may reduce trust that the future is likely to

improve on the part of the CEO interviewee.


---------------------------------------------------------------------------------

Finally, we examined the correlations between the time since peak returns and the topic

categories discussed above (work, life, and emotions). Here, we note a positive correlation between

the days since peak returns and the proportion of the document associated with emotions (p=0.02,

n=82). This dovetails with some of the relationships we observe above with the NRC sentiments,

in which the time since peak returns is associated not only with negative sentiments such as sadness

and disgust, but also positive sentiments such as joy. Table 6 displays these correlations, and

Figure 9 graphically displays the relationship between the days since peak returns and the emotion-

related topic categories.


---------------------------------------------------------------------------------


---------------------------------------------------------------------------------

31

Table 7 shows the results of an OLS model regressing emotion-related topic proportions

on the days since peak index returns, once again with the document segment as the smallest unit.

The full model (Model 3) follows the specification:

Emotionsij = β0 + β1DaysSincePeakj + β2Asiaj + β3Africaj + Yearj + εij


--------------------------------------------------------------------------------- This specification is identical to the OLS specification for the cumulative average returns model,

except that Emotionsij represents the prevalence of emotion-related topics in segment i of

document j, and the main effect estimated by β1 is that of the days since peak index returns for

document j.

According to Table 7, a one standard deviation increase in the time since peak returns

value appears to be associated with approximately 0.02 of a standard deviation increase in

discussion of emotions-related topics, once region effects are included (Model 3).

For robustness to the possibility of the autocorrelation of variables within documents, we

examined several other model specifications using a Hierarchical Linear Modeling approach. Point

estimates for both explanatory variables of interest were largely consistent with the OLS estimates

when employing random intercepts models, as was a random slopes model estimation of the effect

of cumulative abnormal returns on the prevalence of work, life, and emotions topics (Raudenbush

and Bryk 2002). A random slopes specification with the days since peak returns variable did not

converge. Estimated standard errors varied based on the specification: while the random slopes

model estimate of the coefficient of abnormal returns and the random intercepts model estimate of

the coefficient of days since peak returns were robust to the addition of year fixed effects, both

estimated effects shrank when region indicators were added (p > .05, two-tailed test). These results

are available upon request.

32

Discussion

In this paper, we develop a new methodology using topic modeling and sentiment analysis with

application to oral history data, which we argue is an underutilized resource in strategy

scholarship. Our core contribution illustrates this novel method that could render oral history

data more accessible for strategy researchers. For the purposes of illustration alone, we employ

our novel methodology to study how environmental factors affect CEO communication. We find

that a one standard deviation increase in abnormal returns on the day of the CEO interview is

associated with approximately half a standard deviation increase in discussing work-related

content, and a corresponding decrease in life-related categories. In addition, increases in

cumulative abnormal returns are correlated with the CEOs expressing more surprise and less fear

in their language. We also find that a one standard deviation increase in the time since peak

returns value appears to be associated with approximately a tenth of a standard deviation increase

in the discussion of emotions-related topics in CEO interviews. Finally, more time since peak

returns is also correlated with a CEO’s tendency to use trust-related terms in her interview.

Our results contribute to several literatures, notably the literature arguing in favor of

historical analysis in strategy research, the literature on qualitative analysis in strategy research,

the research on upper echelons, managerial attention, managerial cognition and cognitive frames.

Our results also contribute to the emerging literature on how CEOs spend their time.

The exposition of our novel methodology to utilize oral history data adds to the relatively

thin literature on the use of historical data in strategy research. In particular, Jones and Khanna

(2006) outline two dimensions of historical data that makes it difficult for use in broad strategy

research – such data is often “qualitative” and often “small sample”. The authors then suggest

33

methods that strategy scholars could use to analyze historical data and list methods related to

Boolean algebra (Ragin, 1987), string analyses (Abbott, 2001) and computational models

(O’Rourke and Williamson, 1999). Oral history data often shares the qualitative and small

sample properties outlined by Jones and Khanna (2006) and our novel methodology provides

strategy scholars yet another empirical tool to use to further historical analysis in strategy

research. In effect, we show how even with a small sample of interviews (n = 88), our approach

through segmenting each interview transcript allows for a meaningful quantitative analysis

through topic modeling. Because topic models tend to generate unstable and meaningless output

when the input documents are long, the text of a typical oral history transcript is generally not

well suited for topic modeling. Indeed, each of the interviews in our dataset last between one

and two hours. However, by taking the additional step of segmenting each transcript based on

the its turn-taking structure, we demonstrate how to pre-process oral history transcript texts for

appropriate use with natural language processing techniques like topic modeling.

Our methodology also provides firms and strategy scholars empirical means to conduct

“temporal search”, i.e. search for knowledge created at different points in the past. Such

knowledge might be “frozen in time” in oral history or other historical records. Temporal search

of historical data might help firms create competitive advantage through subsequent acts of

innovation and organizational renewal. Analyses of such data using our methodology might help

strategy scholars answer questions related to how temporal search and analyzing the past can be

used by firms to create competitive advantage in the future.

An important contribution of our study is the exposition of a replicable methodology for

using qualitative data such as oral history. Our analysis is based on replicable algorithms and the

use of publicly available interview transcripts (all pre-processed interview transcripts are

34

available with authors upon request). The use of transcripts that can be shared, and the use of

replicable topic modeling tools makes it possible to reproduce our analysis. This is unlike other

qualitative studies where the full interview transcripts, field notes and coder inputs are usually

not available to other scholars.

More broadly, our results (though not the core contribution of our paper) contribute to the

literature on managerial attention. In this literature, Ocasio (1997, 2011) builds on Simon (1947)

to outline the premise of “situated attention”, which posited that what answers decision makers

focus on depends on the particular context or situation they find themselves in. These situated

answers in turn manifest in “procedural and communication channels” such as action

memoranda, quarterly and annual reports, etc. However, while action memoranda and

quarterly/annual reports run the risk of being written by employees of the corporate

communications team and additionally run the risk of being sanitized prior to publication, oral

history data and CEO interviews, provide a relatively unfiltered peak into the situated attention

of the CEO and arguably represents an underutilized communication channel that should be

studied by scholars in the literature of managerial attention.

Our methodology could also be more broadly employed in research in strategy on upper

echelons, managerial attention, managerial cognition and cognitive frames. One of the core

propositions of the literature on upper echelons (Hambrick and Mason, 1984; Hambrick 1994) is

that managers act on the basis of their personalized interpretations of the strategic situations they

face; however the literature has not outlined any precise methodology to measure of managers’

cognitive frames. To quote Hambrick (2007), “demographic characteristics of executives can be

used as valid, albeit incomplete and imprecise proxies of executives’ cognitive frames”

(Hambrick 2007, page 335; italics added by authors). Arguably, topic modeling of oral history or

35

other comparable interview data could help provide a complementary toolkit to code cognitive

frames of managers.

In more recent literature on cognitive frames, Kaplan (2008) defines frames as “means by

which managers make sense of ambiguous information from their environments.” In this

literature, Kaplan (2008) uses CEO letters to shareholders and content analysis to measure

managerial cognition. However, the author also alludes to other sources of data that could be

used to measure managerial cognition, including data obtained through CEO interviews, akin to

the oral history data we use. For Kaplan (2008), “other kinds of statements by CEOs, such as

those obtained through interviews or surveys, might initially appear to be attractive (data)

sources, but they are impractical for larger samples of firms over long periods” (Kaplan 2008,

page 679). One of the reason oral history data has been “impractical” to use in strategy research

so far has been the absence of a robust methodology to use such data. The methodology outlined

in our paper is a step in that direction.

Our methodology could also be employed in the strategy research related to

interpretation. In this stream of research, Barr (1998) traces managers’ interpretations over time

as they grapple with environmental events, and the author uses CEO letters to shareholders as

well as copies of CEO speeches from the Wall Street Transcripts to conduct causal reasoning

analysis (Axelrod 1976; Huff et al., 1990) and Ward’s (1962) method of cluster analysis.

Arguably, the analysis of oral history using the methodology of topic modeling and sentiment

analysis will provide a complementary source of data and a complementary analytical tool kit to

researchers grappling with such questions.

We also contribute to the literature on how CEOs spend their time. In a recent study in

this literature, Bandiera et al. (2017) outline that one of the most important activities on which

36

CEOs spend their time is communication, both inside and outside the organization. Our results

indicate that CEO communication is related to market and firm performance. It is also plausible

that market and firm performance is also correlated to how CEOs choose to allocate their time.

Our study has several limitations. First, because our data are limited to interviews with

CEOs of firms in emerging markets, we cannot generalize our results about our CEOs’ emotion

and topical attention relate to the economic environment in which their firms are situated to

CEOs of firms in developed or under-developed economies. In other words, it is possible that in

a developed economy, a CEO’s emotions might not be as sensitive to greater cumulative

abnormal returns as they would be for CEOs in emerging markets. We encourage researchers to

adopt our methods to future projects that might examine such a comparison. In addition, in

terms of data limitations, as Kaplan (2008) states, the study of oral interview data suffers from

the risk of retrospective bias as managers would likely adapt their memories of their views in

prior years to subsequent outcomes. We partially circumvent this issue by employing our data

and methodology to study how market outcomes affect memories (i.e. how abnormal returns on

day of CEO interview affect memory and CEO communication), rather than studying how

memories of events are related to outcomes. Our methodology is also limited by the fact that the

machine learning process only uses text and is unable to use video or audio material. In

analyzing CEO emotions, it is plausible that coders using video/audio material are better able to

“visualize” emotions such as disgust in the facial expressions and/or voice intonations of the

CEO.

As for other technical limitations, we also can only account for differences in the region-

of-origin for our CEO interviewees and the firms they represent. However, as a feature of the

interview data collection, the CEOs’ regions are also associated with whether or not the

37

interviews themselves were conducted in English. For instance, most CEOs from South

American countries were interviewed in their native Spanish, which meant that our analysis

could only incorporate the English translations of their interview transcripts. Future research

might look into the sensitivity of topic model results to translation effects. Finally, although our

approach utilized unsupervised LDA to estimate topic models, it is possible that a supervised

approach could produce more meaningful topic estimates (Ramage, et al 2009). A supervised

approach would require researchers to read through a sample of transcripts and to associate

certain words with pre-determined topics, giving the topic model a fixed prior for structuring the

relationship between estimated topics. A supervised approach is encouraged when the language

used in a corpus of documents has excessive jargon, such that relevant experts would be able to

identify which specific and salient words should cohere together as a topic. The language in our

interviews do not arguably reflect the excessive use of jargon, but it is possible that other oral

business histories exhibit higher proportions of industry-specific terminology.

In conclusion, we document a novel and replicable methodology for using qualitative

data such as oral history in strategy research. Our methodology is based on using easily available

oral history transcripts and a replicable method based on topic modeling and sentiment analysis.

We also develop a proof of concept of using our methodology and provide evidence suggestive

that CEO communication is correlated with firm and market performance. This result is relevant

for scholarship on how environmental factors affect managerial attention, managerial cognition

and the allocation of CEO time on communication. Most importantly, our methodology opens

the door for strategy scholars to use easily available, yet under-utilized oral history archives

around the world.

38

Selected References Bandiera, O; L Guiso, A Prat, R Sadun, “What do CEOs do?”, Review of Financial Studies, 2017, Forthcoming

Bandiera, O., Lemos, R., Prat, A. and Sadun, R., 2013. Managing the family firm: evidence from CEOs at work (No. w19722). National Bureau of Economic Research.

Barber, Brad M., and John D. Lyon. "Detecting long-run abnormal stock returns: The empirical power and specification of test statistics." Journal of financial economics 43.3 (1997): 341-372. Barley, S. R. 1990 "Images of imaging: Notes on doing longitudinal field work." Organization Science. 1:220-247.

Blei, D. M., et al. (2003). "Latent Dirichlet Allocation." Journal of Machine Learning Research 3. Bromiley, P. and A. Marcus, "The Deterrent to Dubious Corporate Behavior: Profitability, Probability and Safety Recalls," Strategic Management J., 10 (1989), 233-250. Calori, R., Johnson, G. and Sarnin, P., 1994. CEOs' cognitive maps and the scope of the organization. Strategic Management Journal, 15(6), pp.437-457. Chang, J., et al. (2009). "Reading Tea Leaves: How Humans Interpret Topic Models." Neural Information Processing Systems. Chatterjee, A. and Hambrick, D.C., 2011. Executive personality, capability cues, and risk taking: How narcissistic CEOs react to their successes and stumbles. Administrative Science Quarterly, 56(2), pp.202-237.

D'Aveni, R.A. and MacMillan, I.C., 1990. Crisis and the content of managerial communications: A study of the focus of attention of top managers in surviving and failing firms. Administrative science quarterly, pp.634-657. Daft, R.L., Sormunen, J. and Parks, D., 1988. Chief executive scanning, environmental characteristics, and company performance: An empirical study. Strategic management journal, 9(2), pp.123-139. Delgado‐García, J.B., La Fuente‐Sabaté, D. and Manuel, J., 2010. How do CEO emotions matter? Impact of CEO affective traits on strategic and performance conformity in the Spanish banking industry. Strategic Management Journal, 31(5), pp.562-574.

39

Duncan, R.B., 1972. Characteristics of organizational environments and perceived environmental uncertainty. Administrative science quarterly, pp.313-327. Dunning, J.H., 1998. American investment in British manufacturing industry. Taylor & Francis US. Fama, Eugene F., et al. "The adjustment of stock prices to new information." International Economic Review 10.1 (1969): 1-21. Fama, E.F. and French, K.R., 1993. Common risk factors in the returns on stocks and bonds. Journal of financial economics, 33(1), pp.3-56. Gamache, D.L., McNamara, G., Mannor, M.J. and Johnson, R.E., 2015. Motivated to acquire? The impact of CEO regulatory focus on firm acquisitions. Academy of Management Journal, 58(4), pp.1261-1282.

Gao, C., Zuzul, T., Jones, G. and Khanna, T., 2017. Overcoming Institutional Voids: A Reputation‐Based View of Long‐Run Survival. Strategic Management Journal. Griffiths, T. L. and M. Steyvers (2004). "Finding scientific topics." PNAS 101. Hambrick, D.C. and Mason, P.A., 1984. Upper echelons: The organization as a reflection of its top managers. Academy of management review, 9(2), pp.193-206. Hambrick, D.C. and Macmillan, I.C., 1985. Efficiency of product R&D in business units: The role of strategic context. Academy of Management Journal, 28(3), pp.527-547. Helfat, C.E. and Peteraf, M.A., 2015. Managerial cognitive capabilities and the microfoundations of dynamic capabilities. Strategic Management Journal, 36(6), pp.831-850. Hendricks, K. B. and V. R. Singhal, "Quality Awards and the Market Value of the Firm: An Empirical Investigation," Management Sci., 42 (1996), 415-436. Herrmann, P. and Nadkarni, S., 2014. Managing strategic change: The duality of CEO personality. Strategic Management Journal, 35(9), pp.1318-1342.

Hill, R.C. and Levenhagen, M., 1995. Metaphors and mental models: Sensemaking and sensegiving in innovative and entrepreneurial activities. Journal of Management, 21(6), pp.1057-1074. Hiller, N.J. and Hambrick, D.C., 2005. Conceptualizing executive hubris: the role of (hyper‐)

core self‐evaluations in strategic decision‐making. Strategic Management Journal, 26(4), pp.297-319.

40

Hornik, Kurt, and Bettina Grün. "topicmodels: An R package for fitting topic models." Journal of Statistical Software 40.13 (2011): 1-30. Huang, A. Lehavy, R., Zang, A., and Zheng, R. 2017. “Analyst Information Discovery and Interpretation Roles: A Topic Modeling Approach.” Management Science, Forthcoming.

Jacobsen, R., 1988. The persistence of abnormal returns. Strategic management journal, 9(5), pp.415-430. Jones, G. and Khanna, T., 2006. Bringing history (back) into international business. Journal of International Business Studies, 37(4), pp.453-468. Jones, G., 2005. Multinationals and global capitalism: From the nineteenth to the twenty first century. Oxford University Press on Demand. Khanna, Tarun and Krishna G Palepu (with Richard Bullock), Winning in emerging markets, Harvard business press, 2010.

Kogut, B.M. ed., 1993. Country competitiveness: Technology and the organizing of work. Oxford University Press on Demand. Larcker, D.F. and Zakolyukina, A.A., 2012. Detecting deceptive discussions in conference calls. Journal of Accounting Research, 50(2), pp.495-540.

Lefebvre, L.A., Mason, R. and Lefebvre, E., 1997. The influence prism in SMEs: The power of CEOs' perceptions on technology policy and its organizational impacts. Management Science, 43(6), pp.856-878. Lehavy R, Li F, Merkley K (2011) The effect of annual report readability on analyst following and the properties of their earnings forecasts. Accounting Rev. 86(3):1087–1115

Lovins, J. B. (1968). "Development of a Stemming Algorithm." Mechanical Translation and Computational Linguistics 11. Loughran T, McDonald B (2016) Textual analysis in accounting and finance: A survey. J. Accounting Res. 54(4):1187–1230

McNichols, M. and A. Dravid, "Stock Dividends, Stock Splits, and Signaling," J. Finance, 45 (1990), 857-879. Mintzberg, H., 1987. Crafting strategy (pp. 66-75). Boston, MA, USA: Harvard Business School Press.

41

O'Sullivan, M., 2001. Contests for corporate control: Corporate governance and economic performance in the United States and Germany. OUP Catalogue. Portelli, A., 2009. What makes oral history different. Oral history, oral culture, and Italian Americans, pp.21-30. Ramage, D., Hall, D., Nallapati, R. and Manning, C.D., 2009. “Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora.” Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. (pp. 248-256). Association for Computational Linguistics.

Raudenbush, S.W. and A.S. Bryk. 2002. Hierarchical linear models: Applications and data analysis methods (Vol. 1). Thousand Oaks, CA: Sage. Riedl, M. and C. Biemann (2012). "Text segmentation with topic models." Journal for Language Technology and Computational Linguistics 27(1). Salancik, G.R. and Meindl, J.R., 1984. Corporate attributions as strategic illusions of management control. Administrative science quarterly, pp.238-254. Saretzky, G.D., 1981. Oral History in American Business Archives. The American Archivist, 44(4), pp.353-355. Shaw, R. (2015). "Automatically Segmenting Oral History Transcripts." arXiv preprint arXiv(1509.08842). Tasker, S.C., 1998. Bridging the information gap: Quarterly conference calls as a medium for voluntary disclosure. Review of Accounting Studies, 3(1), pp.137-167.

Thompson, P., 2017. The voice of the past: Oral history. Oxford university press. van Dyk, D. A. and T. Park (2008). "Partially Collapsed Gibbs Samplers." Journal of the American Statistical Association 103(482): 790-796. Vernon, R., 1966. International investment and international trade in the product cycle. The quarterly journal of economics, pp.190-207. Watzlawick, Paul, and J. H. Beavin. "B., & Jackson, DD (1967)." Pragmatics of human communication (1967). Wilkins, M., 1970. The emergence of multinational enterprise: American business abroad from the colonial era to 1914 (Vol. 34). Cambridge, Mass: Harvard University Press. Wilkins, M., 1974. Multinational Oil Companies in South America in the 1920s: Argentina, Bolivia, Brazil, Chile, Colombia, Ecuador, and Peru. Business History Review, 48(3), pp.414-446.

42

Yadav, M.S., Prabhu, J.C. and Chandy, R.K., 2007. Managing the future: CEO attention and innovation outcomes. Journal of Marketing, 71(4), pp.84-101. Yao, L., et al. (2009). "Efficient methods for topic model inference on streaming document collections." Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM.

43

Figures and Tables

Figure 1: Overview of Methodological Process

Figure 2 : Plate Diagram for Latent Dirichlet Allocation (LDA)

44

Figure 3: Top Terms for Topic Model

45

Figure 4: Topics Most Predictive of Asia Region

46

Figure 5: Topics Most Predictive of Africa Region

47

Figure 6: Topics Most Predictive of Latin America Region

48

Figure 7: Relationships Between CAR and Selected Sentiment Measures

49

Figure 8: Relationship Between Days Since Peak Returns and Trust Sentiment

50

Figure 9: Relationship Between Days Since Peak Returns and Emotion-Related Topics

51

Table 1: Oral History Archives

ve Name Source Notable and/or Relevant Projects d University Creating ing Markets Initiative

http://www.hbs.edu/creating-emerging-markets/interviews/Pages/default.aspx

Center for Oral History http://oralhistory.library.ucla.edu/Browse.do?coreDescCvPk=27901&Subject=Business Entrepreneurs of the West bia University Oral History es

http://library.columbia.edu/locations/ccoh.html Various interviews with executives and entrepreneurs

Bank Oral history archive http://oralhistory.worldbank.org/

a University Center for the of History and Memory

http://www.indiana.edu/~cshm/alphalist.html Indiana Automobile Industry, Generations Auto Workers

sity of California Berkeley istory Collection

http://www.lib.berkeley.edu/libraries/bancroft-library/oral-history-center/search-oral-histories

Venture Capital, Amgen, Biotechnology, Business archives

sity of Connecticut Oral y

http://www.oralhistory.uconn.edu/catalog.html Connecticut Workers and a Half Century of Technological Change, 1930-1980

sity of Kentucky Louie B. Center for Oral History

https://kentuckyoralhistory.org/ Kentucky Entrepreneurial History Collectio

itish Library https://www.bl.uk/collection-guides/oral-histories-of-business-and-finance An Oral History of the Electricity Supply Industry, Prudential Interviews

story Factory http://www.historyfactory.com/ The History Factory helps large firms chron their own histories through interviews

y Associates https://www.historyassociates.com/who-we-serve/our-clients/ Same as The History Factory sity of Florida Oral History tions

http://ufdc.ufl.edu/ohfbl Florida Business Leaders Oral History Coll

http://library.columbia.edu/locations/ccoh.html

http://www.indiana.edu/%7Ecshm/alphalist.html

http://www.oralhistory.uconn.edu/catalog.html

http://ufdc.ufl.edu/ohfbl

52

Table 2: Correlations Between Abnormal Returns and NRC Sentiments

Table 3: Relationship Between Abnormal Returns and Topic Categories

53

Table 4: OLS Regressions of Cumulative Abnormal Returns on Topic Categories

Table 5: Relationship Between Days Since Peak Returns and NRC Sentiments

54

Table 6: Relationship Between Days Since Peak Return and Topic Categories

Table 7: OLS Regressions of Days Since Peak Index Returns on Topic Categories

Documents

Modeling Oral Business History Data: An Application to ... Files/18-064_cd75f777-3230-4b41... · Modeling Oral Business History Data: An Application to Markets and CEO Communication