36
1 Don’t Pretend to Test Hypotheses: Lexicon and the Legitimization of Empirically-Grounded Computational Theory Development * Nicholas Berente University of Georgia Stefan Seidel University of Liechtenstein Abstract The unprecedented availability of digital traces of human activity is already revolutionizing IS research. In this research commentary, we construct an epistemological foundation for how trace data can be used to develop novel and important theory. We draw upon lessons learned from Grounded Theory Method (GTM) and articulate a broader “grounded paradigm” that emphasizes the importance of a lexicon in the process of building theory. In doing so, we highlight the notion of emergence and provide a pragmatic epistemological foundation for different types of grounded analysis. We highlight the key iterative cycles of four activities that are pivotal in the generation of theory grounded in empirical data: sampling, synchronic (coding) analysis, lexicon reference, and diachronic (causal) analysis. On this basis, we pro- pose a model that describes the process of theorizing from empirical data that also accommo- dates computational forms of data analysis. In constructing this framework, we hope to arm researchers with a legitimate approach for inductively generating theory from computational analysis of empirical data, and thus avoid recasting exploratory analyses in terms of hypothe- sis testing. * This research commentary is under review at Information Systems Research

Don’t Pretend to Test Hypotheses: Lexicon and the · PDF file · 2015-07-11Lexicon and the Legitimization of Empirically-Grounded Computational Theory Development * ... dates computational

Embed Size (px)

Citation preview

1

Don’t Pretend to Test Hypotheses:

Lexicon and the Legitimization of Empirically-Grounded

Computational Theory Development *

Nicholas Berente

University of Georgia

Stefan Seidel

University of Liechtenstein

Abstract

The unprecedented availability of digital traces of human activity is already revolutionizing IS

research. In this research commentary, we construct an epistemological foundation for how

trace data can be used to develop novel and important theory. We draw upon lessons learned

from Grounded Theory Method (GTM) and articulate a broader “grounded paradigm” that

emphasizes the importance of a lexicon in the process of building theory. In doing so, we

highlight the notion of emergence and provide a pragmatic epistemological foundation for

different types of grounded analysis. We highlight the key iterative cycles of four activities

that are pivotal in the generation of theory grounded in empirical data: sampling, synchronic

(coding) analysis, lexicon reference, and diachronic (causal) analysis. On this basis, we pro-

pose a model that describes the process of theorizing from empirical data that also accommo-

dates computational forms of data analysis. In constructing this framework, we hope to arm

researchers with a legitimate approach for inductively generating theory from computational

analysis of empirical data, and thus avoid recasting exploratory analyses in terms of hypothe-

sis testing.

* This research commentary is under review at Information Systems Research

2

Don’t Pretend to Test Hypotheses:

Lexicon and the Legitimization of Empirically-Grounded

Computational Theory Development *

Introduction

The abundant and ever-increasing trace data now widely available offers boundless

opportunities for a computational social science (Lazer et al., 2009). Indeed, some expect the

widespread availability of a variety of trace data to do nothing less than revolutionize the so-

cial sciences (Latour, 2010) and challenge established paradigms (Lazer et al., 2009).

Through direct computational attention to trace data, researchers can potentially skip some

pre-established latent constructs (what Latour refers to as “imaginary”—see Latour, 2005)

and generate richer and more accurate understandings of social life—insights closer to the

source (Latour, 2010). Trace data typically requires computational tools for novel visualiza-

tions and pattern identification (Latour, 2010; Lazer et al., 2009), which provides ample fod-

der for predictive modeling (Shmueli and Koppius, 2011). But such models, patterns, and

visualizations are not theory (Agarwal and Dhar, 2014).

Trace data does offer an opportunity to generate novel theory, but when patterns are

inductively discovered, researchers have been known to retrofit hypotheses and present these

patterns under the guise of hypothesis testing (Locke, 2007). Norms, traditions, and pressure

to publish reinforce this widespread practice of recasting exploratory analysis in terms of hy-

pothesis testing, but this practice goes against the very tenets of strong hypothetico-deductive

research (Anonymous, 2015). There is simply no widely-accepted alternative to hypothetico-

deductivist methods when generating theory from trace data using computational tools, and

this exacerbates the problem. To unleash the power of trace data, scientists can benefit from a

legitimizing epistemology for inductive generation of novel theory. Grounded Theory Method

3

(GTM; Glaser & Strauss 1967) has offered this epistemological foundation, largely for quali-

tative research in the past, but it can also offer computational social science a similar legiti-

mizing foundation.

GTM is intended to enable novel theorizing from large amounts of data (Legewie and

Schervier-Legewie, 2004). Although GTM’s adherents typically work with qualitative data,

GTM was originally formulated to accommodate both quantitative and qualitative data

(Glaser, 2008; Glaser and Strauss, 1967).1 However, the method is extremely labor-intensive

and researchers cannot be expected to pour over gigabytes of trace data using coding strate-

gies commonly associated with qualitative data—it would simply be too manually intensive.

Similarly, trace data typically does not come in readily comparable numerical form, which

would lend itself to the sort of ordering that was recommended for quantitative data (Glaser,

2008; see also Glaser and Strauss, 1967). The lessons of GTM thus do not necessarily transfer

directly to the computational analysis of trace data. However, perhaps there is some value in

using GTM in a more fundamental sense—as a legitimizing epistemological position—since

it is the predominant existing method for generating theory from sets of unstructured data.

In this research commentary, we inquire into how the lessons from GTM apply to

computational analysis of trace data in an effort to generate theory. We develop a generalized

approach to legitimize inductive “grounded” theory development that can be applied to any

sort of data—including trace data. We root this approach in two key concepts from social

analysis: induction and generalization, and we then draw upon Habermas’s notion of “rational

reconstruction” to highlight the central role of a lexicon as a pre-theoretic foundation for any

theoretical knowledge. Highlighting this central role of the lexicon, we articulate a broad

“grounded paradigm” for information systems (IS) research that emphasizes the notion of

emergence—emergence of study design, categories, and theory—and thus legitimizes explor-

1 Of course, in the literature GTM has almost exclusively been used with qualitative data.

4

atory computational research, and helps avoid recasting exploratory research in terms of hy-

pothesis testing. This view of the grounded theory paradigm is not intended to supplant GTM

for qualitative analysis. Rather, it is intended to leverage the insights associated with GTM

more widely, and to provide a pragmatic epistemological basis for all sorts of grounded anal-

ysis. This might inform GTM research—particularly by explicitly attending to the role that a

lexicon plays in theory generation—but is primarily intended to make the key ideas portable

for opportunities proffered by the computational analysis of trace data.

We proceed as follows. In the next section, we provide an overview of both GTM and the

computational analysis of trace data. We then highlight our view that the all-important meta-

phor of emergence is what constitutes the essence of GTM. This is followed by a discussion

of the process of theory generation to provide the basis for our subsequent framework devel-

opment, where we describe how theory can be built inductively based on any kind of data,

and where we explain the pivotal role of a lexicon in this process. We illustrate the applicabil-

ity of the proposed approach using two empirical studies published in the IS field, and discuss

our findings and implications. We conclude by reviewing the contributions of this effort.

Overview: GTM & Computational Analysis of Trace Data

In the middle part of the 20th century, Barney Glaser and Anselm Strauss sought to

bring together the rigorous empirical approach of the Columbia school of sociology with the

creative, but often less rigorous, approach of the Chicago school (Glaser and Strauss, 1967).

This was motivated largely by the desire to inspire new and innovative efforts at theorizing

rather than application of existing theories (what Strauss referred to as “working” these

existing theories, see Legewie and Schervier-Legewie, 2004) in a way that was legitimate and

acceptable in light of the prevailing, positivistic social science of the time. Thus the original

goal of the grounded theory effort might be characterized as twofold: (1) to encourage novel

5

theorizing; and (2) to develop an empirically-driven methodology to enable this novel theoriz-

ing.

The result was GTM, which has been one of the strongest catalysts for widespread ac-

ceptance of qualitative research as well as inductive theory building across a variety of social

science disciplines (Bryant and Charmaz, 2007; Eisenhardt, 1989). Grounded theory seeks to

develop theoretical concepts and relationships while being informed by intense analysis of

empirical data (Glaser and Strauss, 1967; Strauss and Corbin, 1990). The methodology offers

researchers a specific process for “coding” unstructured data to develop theory. This process

of coding involves multiple passes through the data, iteratively identifying concepts and cate-

gories (i.e., more abstract concepts) that become more general at each pass, and then iterative-

ly relating these concepts and categories to each other, resulting in the generation of theory.

The goal is to construct theoretical propositions rooted in empirical data (rather than existing

theory), which may or may not be subjected to subsequent testing. 2

Since their seminal work on the discovery of new theory through this rigorous, empir-

ically-grounded methodology, IS researchers have adopted the perspective with varying de-

grees of faithfulness to the method (Matavire and Brown, 2011; Urquhart, Lehmann, and

Myers, 2010). Sometimes GTM is adopted quite rigorously, and other times it is invoked as a

sort of catch-all method for qualitative or interpretive research that does not necessarily attend

to the fundamental tenets of the method (Urquhart and Fernández, 2013). This has led to re-

cent calls to take the method more seriously in the IS field (Urquhart and Fernández, 2013;

Urquhart et al., 2010) as well as in other fields (e.g., Suddaby, 2006).

In recent years, this attention to GTM as a rigorous and increasingly well-understood

method has been incorporated into the review process of the IS field, and there is an estab-

lished discourse around reflective and appropriate application of GTM (e.g., Matavire and

2 It has been stated that grounded theory can be presented in both propositional and discussional form (Glaser

and Strauss, 1967; Strauss, 1987).

6

Brown, 2011; Seidel and Urquhart, 2013; Urquhart et al., 2010). These recent trends indicate

a maturation of the methodology in the IS field. With this maturation, however, it is important

to avoid the dogmatism that sometimes accompanies the establishment of discourses within

the IS field. Dogmatism may close down the discourse precisely when it should be evolving

and staying current (Ciborra, 1998). In primarily emphasizing the second goal of the ground-

ed theory movement—the establishment of a valid methodology—we must be careful not to

undermine the first goal—novel forms of theorizing. A move toward stringent application of

GTM as a method, however positive for the discourse concerning that particular approach,

may in some ways undermine the spirit of creativity that led to the method in the first place.

In this paper, we look to preserve the maturation of GTM as a method, but also to

carve out a space for alternative empirically-grounded inductive approaches that do not neces-

sarily follow the prescriptions of GTM, nor the traditionally qualitative approach usually as-

sociated with GTM. Particularly in the context of widely available trace data and computa-

tional social science, the unprecedented access to different forms of data can drive alternative

inductive approaches. Latour (2010), for example, argued that the current explosion of digital

trace offers an unprecedented opportunity to explore empirical phenomena without the bag-

gage of traditional qualitative and quantitative approaches—they offer the potential for strong,

empirically-grounded inductive theorizing, yet cannot really be incorporated into the solidify-

ing discourse around GTM.

GTM in a Nutshell: The Importance of Emergence

Over the years, GTM has evolved into a contested “family” of methodologies, rather

than one, very specific method (Bryant and Charmaz, 2007). This family of methods is replete

with variants and rich in reflective discourse. There are disagreements on coding procedures

(e.g., Kelle, 2007a), the role of existing research (e.g., Jones and Noble, 2007), epistemologi-

cal foundations (e.g., Charmaz, 2000), and a host of other divisions (also compare Seidel and

7

Urquhart, 2013). At a very broad level, one can distinguish between ‘classical’ grounded the-

ory that is much in line with the original version of GTM as proposed by Glaser and Strauss

(Glaser and Strauss, 1967), and ‘evolved’ grounded theory as, for instance, proposed by

Strauss (1987), Strauss and Corbin (1990, 1998), or Charmaz (2000; 2006) in the form of

‘constructivist’ grounded theory. From a unifying perspective, however, the method can be

thought to involve a number of key elements. Anselm Strauss alludes to these key elements

when he describes the motivation behind writing the book together with Barney Glaser

(Glaser and Strauss, 1967) and founding the method:

The important thing about that book, which is still not understood by most people, is that… it really had

three different purposes. One was, we were trying to legitimate qualitative research in a period when it

wasn’t yet legitimated… The second reason we did it is that we wanted to attack people like Blau, Par-

sons, and Merton, because their theories were being taken over by students and younger sociologists…

they weren’t challenging the theories, they were just working them… they were dotting i’s and crossing

t’s of these theories. So we attacked so-called “received” theories that were defective…. The third rea-

son we wrote the book is that we wanted to put forward the idea of doing theory that was grounded

(Strauss 1994 in Legewie and Schervier-Legewie, 2004).

Thus, GTM’s current manifestations flow from these original purposes. First, GTM in-

volves studying qualitative data (Strauss and Corbin, 1998; Urquhart, 2013)—in fact, some

have argued that GTM was one of the key methodologies for establishing the legitimacy of

qualitative research in a number of fields (Bryant and Charmaz, 2007). Second, GTM is pri-

marily concerned with the generation of theory—preferably novel and interesting theory—

rather than just “working” pre-existing views. Third, this theory generation is a result of in-

tense empirical analysis. GTM is “grounded” in data—typically lots of data. Beyond these

three original motivations, there are a number of practices that are often explicitly associated

with GTM, including theoretical sampling (Glaser and Strauss, 1967; Hood, 2007; Morse,

2007), memo writing (Lempert, 2007; Strauss and Corbin, 1998), and the development and

8

use of emergent concepts (Glaser, 1978; Kelle, 2007a; Strauss and Corbin, 1998). However,

the three key areas—qualitative research, new theory development, and empirically grounded

induction—each always seem to be present in GTM efforts. This is interesting, particularly

since each of these fundamental tenets of GTM has been deemed problematic. Next, we brief-

ly describe challenges to each of these areas: (1) although it is typically a qualitative method,

GTM can also accommodate quantitative analysis; (2) although it is aimed at new theory,

GTM necessarily allows for existing theory; and (3) although GTM involves a rigorous pro-

cedure for induction, so do a variety of other methods:

Qualitative vs. Quantitative: Although GTM is credited with helping legitimize

qualitative research, its founders never intended for this process to be exclusively

qualitative. In their pioneering text, Glaser & Strauss (1967) spell out a procedure for

theoretically ordering quantitative data in “elaboration tables” to enable the generation

of new theory. Although qualitative data was most definitely the focus, Glaser &

Strauss (1967) went through great lengths to describe rigorous alternatives to the “ver-

ification” oriented research that is rooted in quantitative data. They suggested that

rich, quantitative data can also be explored inductively to identify patterns and gener-

ate (rather than test) hypotheses. Glaser continues to promote this same methodology

as a way to mine the realms of unused and un-accessible survey data that have been

generated over the years for further insight (Glaser, 2008), and others have begun de-

fining different roles for quantitative analysis in the pursuit of grounded theory

(Urquhart, 2013; Walsh, Holton, Bailyn, Fernandez, Levina and Glaser 2015).

Existing Theory: Glaser and Strauss originally thought to develop theories without

respect to existing theories in order to generate new insight. This view, however, has

been criticized because it is both unrealistic (people always have pre-existing theories

in their minds) and leading to trivial or non-original results (if analyzing phenomena

9

without existing literature, researchers may reconstruct existing explanations)

(Urquhart and Fernández, 2013). And indeed, it has been argued that the understand-

ing of GTM as a purely inductive method does not hold by either the Straussian or

Glaserian schools of thought (Kelle, 2007a, 2007b). Reichertz (2007) observes that

since observation is always conditioned by implied theory, GTM is more abductive in

the sense of Peirce (1992). Proponents of constructivist grounded theory have argued

that grounded theories are constructed by researchers based on their interactions with

the field (Bryant, 2002; Charmaz, 2000; Charmaz, 2006), much in line with the prag-

matist origins of the method (Strübing, 2007).

Empirically-grounded Induction: In a (relatively) recent article, Jane Hood (2007)

distinguishes between GTM and what she describes as the “Generalized Inductive

Qualitative Model”—which is shorthand for other forms of case-based qualitative re-

search with inductive goals. According to Hood (2007), it is not the qualitative aspect,

nor the theory generation, nor the inductive empirical basis that mark grounded theo-

ry. Instead, it is the emergent nature of the theory hand-in-hand with research as it is

being conducted. Sampling should progress based on findings as data are continuously

compared (constant comparison) in a theoretically-driven way to enhance more insight

(theoretical sampling) and should only end when no new findings are emerging (satu-

ration). In this respect, the concept of theoretical sensitivity (Glaser, 1978) has been

discussed intensively, and Glaser (1978; 2005) has introduced a total of 41 coding

families that are intended to help the researcher identify relationships in the data,

thereby sensitizing her to allow for emergence while avoiding to use preconceived

ideas and conceptualizations. Grounded research grows organically from within, not

through a pre-structured plan. If a researcher goes to her data with a purposeful sample

based on pre-existing concepts and categories, then the design of the research does not

10

emerge, but is pre-specified. In GTM research, the “design, like the concepts, must be

allowed to emerge during the research process” (Strauss and Corbin, 1998, p. 33). Ei-

senhardt (1989), in her seminal work on theory building from case data, presents an

inductive approach that is characterized by its iterative and emergent nature.

On the basis of this brief analysis, we find that it is neither qualitative data, nor complete-

ly novel theory, nor the idea of induction that mark grounded theory. The key element appears

to be the idea of emergence. The design, the concepts, the sample, and the eventual theory,

are all expected to emerge through the creative, adaptive, yet rigorous processes of the re-

searchers with the data and the sample. In his seminal work on organizational theory, Bacha-

rach (1989) points out that “theory” is essentially a linguistic device that researchers use to

organize empirical data in a way that simplifies those complex data with the use of concepts,

and that asserts certain relationships among those concepts within some boundary condition

and constraints.3 He points out that theoretical statements vary in their abstraction from the

data (induction) and generalizability to other contexts (Bacharach, 1989). Next, we attend to

these two dimensions (induction and generalization) to show how emergence can be consid-

ered in both, traditional qualitative GTM and the computational analysis of trace data. In do-

ing so, we highlight the importance of theoretical codes as concepts that draw on reference

lexicons to generate theory—an idea consistent with Bacharach’s view of theories as linguis-

tic devices.

Generating Theory: Induction, Generalization, and Lexicon

The Continuum of Induction: Analysis to Explain & Analysis to Derive

It is a bit ironic that Glaser was a student of Robert Merton, one of the staunchest and

most high profile advocates for positivistic, functionalist social science. According to Merton

3 Note that the definition of “theory” is a contested issue (see DiMaggio, 1995; Sutton and Staw, 1995; Weick,

1995), but to conceive of theory in terms of general statements about the relationship among concepts is a com-

monly accepted view.

11

(1957), findings from the interpretation of existing data have the illusory advantage of plausi-

bility, but offer very little of value in terms of “compelling evidence value” (p. 93). Essential-

ly, Merton argued that while theory used to explain existing data may fit that data, there is

little reason (in his mind) to accept one theory instead of another that might also conceivably

fit that data. There is virtually any number of possible explanations for a phenomenon, and

interpretive conclusions cannot make claim to providing the “right” lens for the data. Of

course, a veritable army of interpretive researchers have debunked this view over the years—

from a variety of pragmatist, constructivist, and interactionist perspectives (e.g., Strauss and

Corbin, 1990, 1998) or phenomenological and ethnomethodological perspectives (e.g., Guba

and Lincoln, 1994; Van Maanen, 1979).

However, Merton did make an interesting distinction in the relationship of theory to

interpreted data. He describes the difference between using theory to explain vs. deriving the-

ory from the data (Merton 1957).4 When deriving theory, the researcher proceeds from obser-

vations via empirical generalizations to theory and thus reasons inductively (see also

Handfield and Melnyk, 1998). That is, concepts and relationships are derived from the data.

In explaining data with theory, on the other hand, consequences are deduced from a hypothe-

sis, and then compared with empirical observations. That is, existing concepts and relation-

ships are used to make sense of data. These two modes of reasoning are captured by the con-

tinuum of induction (Figure 1).

Explain:Existent theory explains data

Derive:Compeltely emergent new theory

Figure 1. The Continuum of Induction

In practice, the distinction between “derive” and “explain” is blurry, and both types of

reasoning are typically applied in conjunction (Handfield and Melnyk, 1998)—although with

4 Glaser used the terms “generation” vs. “verification,” which can also be conceived in terms of “exploratory” or

“confirmatory” data analysis; see Walsh et al 2015.

12

varying emphases, as captured through the continuum. This is also the case in grounded theo-

ry studies, where the researcher starts with initial slices of data, but then compares emergent

concepts with incoming data (Glaser, 1978; Glaser, 2005; Strauss and Corbin, 1990), and

where the researcher might also be sensitized by existent, theoretical codes (Glaser, 1978;

Glaser, 2005). In building theory from trace data, the empirical basis upon which data is in-

ductively derived increases, thus providing more opportunity to compare emergent concepts

as well as theoretical codes with incoming data. Using additional slices of data helps further

substantiate emergent concepts and relationships, and in turn lead to more accurate theoretical

accounts.

The Continuum of Generalization: Substantive & Formal Theorizing

Glaser & Strauss (1967) distinguished between substantive and formal theorizing.

Substantive theorizing involves highly contextual relationships between variables that are

situated in a particular context. Formal theorizing generates broader, more general theory. The

difference is thus in the level of abstraction (Kearney, 2007). At this, one can proceed from

substantive to (more) formal theory, as indicated in Glaser (1967) and elaborated on, for in-

stance, in Urquhart et al. (2010). Urquhart et al. (2010) provide a nuanced view of levels of

generalization in theorizing by distinguishing between bounded, substantive, and formal con-

cepts. Bounded concepts are intensely contextual, substantive concepts address a “middle

range” between the contextual and the general, and formal concepts are intended for broad

generalizations (see Figure 2). In building theory from trace data, the analyst might indeed

consider both substantive and formal areas of investigation. In the manual grounded theory

analysis of data, it has been suggested that researchers develop different substantive theories

in order to proceed to more general, or even formal, theory (Urquhart et al., 2010).

13

Substantive TheoryNarrow Generalization

Formal TheoryBroad Generlization

Figure 2. The Continuum of Generalization

While the two dimensions have been discussed in seminal works on grounded theory

method, they are not only relevant to grounded theory studies, but indeed to any study that

focuses on emergence. In the next section, we will use Habermas’s rational reconstruction to

explain how theories in a grounded paradigm are developed through the elicitation of con-

cepts from empirical data using a pre-theoretic lexicon.

Rational Reconstruction, the Level of Theory, and the Level of Lexicon

Habermas (1983, 2003) conceived of social science in terms of a “rational reconstruc-

tion” of empirical phenomena within a particular community of researchers. In characterizing

social science in such a way, Habermas looked to cut a line between the objectivist paradigm

of scientific method and the subjectivist paradigm associated with interpretive positions

(Pedersen, 2008). The rational reconstructive perspective enabled Habermas to approach sci-

ence with an appreciation for rigor, evidence, and the methodological advantages of the scien-

tific method, without the naïve positivism that often accompanies it. Similarly, this view al-

lowed him to appreciate individual interpretations without falling into the relativism of “any-

thing goes.”

Rational reconstruction is essentially an epistemological position. Through a rationally

reconstructive view, communities of social scientists who share a “lifeworld” necessarily also

14

share some level of an intersubjective understanding of the world that they study (Habermas,

1983). Fundamental to this intersubjective understanding is a lexicon of shared vocabulary

involving key concepts that reflect the assumptions, history, and institutional context of the

community.

The lexicon is the scaffolding upon which the scientific community is constructed. To

Habermas, any knowledge is always with respect to a community, and made sense of (i.e.,

reconstructed) with respect to the lexicon of that community. This lexicon provides a “pre-

theoretic” grammar. As such the lexicon provides the basic concepts and relationships upon

which the community can construct a theoretical understanding. These “pre-theoretic” ele-

ments enable its members to communicate with each other and compare findings—

generalizations are always made with respect to a particular lexicon (Pedersen, 2008).

This notion of a pre-theoretic lexicon requires some explanation. A pre-theoretic lexi-

con is a relational concept that serves as the grammar for a community to build theory. For

example, at a very low level, the English language is a pre-theoretic lexicon for the English

speaking community—the English language comes complete with concepts, definitions, and a

variety of assumed relationships. In the 20th century scholars of a variety of disciplines devel-

oped a “general systems theory” to provide a pre-theoretic lexicon for their disciplinary find-

ings. For example, English-speaking scholars interested in social theory (such as Parsons and

Merton) used this systems lexicon to describe “social systems” which were complete with

structures, functions, institutions, norms, roles, culture, etc. Subsequent scholars drew heavily

upon this social structure language as the pre-theoretic lexicon for emergent theoretical per-

spectives such as Giddens’s structuration theory (with its particular lexicon of structure, agen-

cy, signification, legitimation, domination, etc.) and the social network perspective (with its

lexicon of nodes, ties, distance, centrality, homophily, etc.). Subsequent scholars use these

lexicons as a starting point (or pre-theoretic lexicon) for their theorizing. Figure 3 illustrates

15

how the English language as well as meta-theoretical frameworks and theories can provide

pre-theoretic lexicons in relation to different scholarly communities, and different areas of

investigation these scholarly communities theorize about. Each level of lexicon provides the

pre-theoretic grammar for the subsequent level.5

English Language

General Systems Theory

Social Network Analysis

English-speaking scholars interested in theorizing about network social

structures

English speaking scholars from different disciplines interested in

different types of systems

English speakers

Example Lexicons

Leve

l of

Lexi

con

Scholarly Communities

Pre-theoretic lexicon

Pre-theoretic lexicon

Pre-theoretic lexicon

Social Systems Theory Pre-theoretic lexicon

English speaking scholars from functionalist sociological & anthropological traditions

Figure 3. Pre-theoretic Lexicons as Relational Concepts

When analyzing empirical data through a theoretical lens, scientists use a lexicon

shared by their community which provides ready-made constructs and statements of relation-

ships that they can then build upon. Specific theoretical positions are inherent in the particular

terminologies in their contexts of use. Habermas’s concept of rational reconstruction can be

used to describe how social scientists draw upon lexicons to analyze empirical phenomena in

relation to the above two continua of induction and generalization, and this leads to four ma-

jor modes of inquiry. From a broad perspective, in deriving theory, lexicons are generated or

5 Note that this depiction simplifies the issue. The social systems theory of Parsons, for example, does draw on

general systems theory, but also heavily draws upon existing sociological traditions (Max Weber in particular).

Similarly, although social network analysis finds its roots in Parsons’s ideas of social structure, it also draws

upon a number of other traditions. So a variety of pre-theoretic lexicons may be drawn upon to generate a lexi-

con at a higher level.

16

extended in either a substantive or formal setting, based on the level of generalization. The

result is a new or adapted lexicon. Using a lexicon to explain phenomena either substantially

(contextually) or more generally may result in new knowledge for that community, but does

so strictly within the bounds of that existing lexical framework. A framework of these four

modes of the relationship between theoretical lexicons and induction and generalization is

summarized in Table 1.

Table 1. Continua of Induction, Generalization, and the Level of Lexicon

Derive Explain

Formal Theory

Generate Lexicon: Develop or modify “general

theories”

Example: Extend Social Network Analysis (SNA)

to include multidimensional networks (Contractor,

Monge, and Leonardi, 2011)

Use Lexicon: Broadly apply general theories; use

lexicon to explain.

Example: Apply structuration theory to IS field

(Jones and Karsten, 2008)

Substantive

Theory

Generate Lexicon: Develop local or mid-range

theories

Example: Extend structuration theory to the area of

IT (Orlikowski, 1992)

Use Lexicon: Extend general theory in domain

Example: Apply SNA to open source software

(Grewal, Lilien, and Mallapragada, 2006)

Next, we provide examples of each mode, using structuration theory and social net-

work analysis as examples of lexicons commonly used in information systems.

Generate Lexicon for Formal Area of Inquiry: In deriving formal theory based on

empirical data, the lexicon must often be adapted to accommodate new modes of inquiry or

forms of data. Thus, while generating new theory, the researchers would need to extend the

original lexicon with new concepts. This may involve drawing on another lexicon. For exam-

ple, Contractor et al. (2011) extended SNA using the sociomaterial lens, thus resulting in a

new lexicon to explore “multidimensional networks.” SNA provided a pre-theoretic lexicon

that provided the grammar used for novel theorizing.

Generate Lexicon for Substantive Area of Inquiry: In deriving substantive theory

through a primarily inductive process, one generates a lexicon that serves as the scaffolding

17

that allows analyzing a substantive area of inquiry. An example is Orlikowski’s (1992) semi-

nal extension of structuration theory to accommodate IT artifacts in organizational contexts.

In this case, structuration theory served as a pre-theoretic lexicon that provided the grammar

for novel theorizing.

Use Lexicon in Formal Area of Inquiry: In using a formal theory in a broad domain,

a general lexicon is applied very generally. This does not extend the lexicon, but applies it and

makes general claims. An example in information systems research is Jones & Karsten

(2008), who reported on a review of structuration theory in the information systems literature.

The general lexicon provided by structuration theory (i.e., concepts and relationships) was

applied to an entire field.

Use Lexicon in Substantive Area of Inquiry: In applying an existing lexicon in a

specific domain, a more general model is used in order to explain the practices in a more nar-

row (i.e., substantive) field. For instance, Grewal et al. (2006) used the well-established SNA

concepts of “embeddedness” and related them to success in the context of open source soft-

ware development. Typical theory-testing studies would often fall into this category, as a gen-

eral lexicon is applied to derive hypotheses that explain observations made in a sample of a

population.

The four modes as described above are idealized cases, and in research practice we of-

ten use aspects of different modes. In the next section, we use the above framework (level of

induction, level of generalization, and level of lexicon) to describe how grounded theory can

be developed based upon both, the manual analysis of qualitative data and the computational

analysis of trace data.

Trace Data and Generating Grounded Theory

18

Emergence: The Processes of Manual and Computational Grounded Theory

Traditional qualitative GTM begins with the world’s biggest dataset—the world it-

self—and reduces this dataset by sampling from the world in an area of interest. This sam-

pling should be theoretical—what is known as “theoretical sampling”—in that the sample

should be developed and extended based on the results of analyzing that existing sample

(Glaser and Strauss, 1967). In this view a smaller, initial sample should be taken and ana-

lyzed, then subsequent samples should be informed by this analysis—they should help to fol-

low-up on the insights that began to emerge from the initial sample. As such, the sample

emerges over time, and this emergence is informed by existing analysis.

The analysis of the sample similarly emerges over time, thereby following a coding

strategy that involves categorizing elements of the data and looking for associations among

these elements. Coding and analysis can follow a number of paths, the most well-known of

which is the open, axial, and selective coding cycles in Straussian GTM (Strauss, 1987;

Strauss and Corbin, 1990, 1998). When developing the resulting theory, the researcher draws

upon the language of an existing discourse—a pre-theoretic lexicon in the respective field -

such as the way that the information systems field, for example, might speak of phenomena in

question—the general meaning of terms (Gaskin, Berente, Lyytinen, and Yoo, 2014). Typical

pre-theoretic lexicons used in grounded theory are theoretical codes provided by Barney Gla-

ser (e.g., ‘amplifying causal looping’ or ‘conjectural causation’) or the coding paradigm pro-

posed by Strauss (1987) and later Strauss and Corbin (1990, 1998) that distinguishes between

causal conditions, intervening conditions, contextual factors, action/interactional strategies,

and consequences. These codes are sufficiently abstract to provide a suitable pre-theoretic

lexicon to a broad range of social research, and have been applied in areas such as nursing,

management, organization studies, and indeed information systems. The categories and their

19

associations developed in the coding process represent a pre-theoretic understanding of phe-

nomena, too—but at a more specific level pertaining to the substantive context that is studied.

The understanding and pre-theoretic lexicon are iteratively refined and generalized in

a process of sensemaking in order to generate theory. Sensemaking in the context of data

analysis is the process where “schemas” (mental models containing information about ob-

served objects) are compared to observed information and, if discrepancies are identified,

either the schema is updated or the information is disregarded (Grolemund and Wickham,

2014).6 In order to maintain such schemas, humans require lexicons providing the necessary

concepts and relationships upon which they can construct understanding.

The resultant theory might then be used as a pre-theoretic lexicon in theory generation,

or may indeed be quantitatively tested.

This process (see the left side of Figure 4 below) can be summarized in the following

steps:

(1) Initial sampling from the world

(2) Iterative coding to generate concepts and their associations, which constitute the

pre-theoretic understanding

(3) Iterative sensemaking relating the pre-theoretic understanding to an existing lexi-

con in the field to generate theory

That is, the primary goal of grounded theory development is to build or extend a lexi-

con in a substantive area of investigation while, at the same time, drawing on an existing lexi-

con in order to make sense of the data. However, it is important to note that the process is far

more iterative and emergent than one might conclude from this image. The constant compari-

6 Grolemund and Wickham (2014) focus on the cognitive processes of data analysis. Since our view (and that of

Habermas) is social theoretic, we make no claims about what goes on inside the skulls of individual researchers,

instead we look to their language as the shared artifacts of their scientific field. Thus lexicons account for shared

pre-theoretic understandings of a field, without implying any knowledge of any particular researcher’s cognition.

20

son among the data, the concepts and categories (pre-theoretic understanding), the lexicon,

and the resulting theory is highly iterative and emergent over time.

Theory

Concepts Patterns

Qualitative data sample

Trace data sample

World

LexiconPre-theoretic

understandingPre-theoretic

understanding

Theoretical sampling

Theoretical extraction

Associations Correlations

Sensemaking Sensemaking

Figure 4. The grounded theory process of theorizing from data

The same basic ideas can be applied to the computational analysis of trace data. In

such computational analysis, the same basic operations are used as in the manual (qualitative)

grounded theory analysis.

First, the world (the largest dataset) is represented by a digital trace, and the researcher

extracts some portion of this digital trace. Naturally, the sample drawn with the purpose of

being subjected to computational grounded analysis can be much larger as those samples typ-

ically subjected to manual grounded theory analysis. Examples include trace data such as

GPS signals from cell phones, social network data from Facebook or Twitter, sales data from

online shopping, data from machinery, etc. Then the researcher further extracts data in an

organized fashion either through predetermined categories (e.g., data with metadata) or by

generating categories during this extraction (e.g., map reduce).

21

Holland and associates (1986) highlight the necessity of categorizing raw data into

useful concepts and then looking for relationships among these concepts in any computational

inductive process. They describe the categorization and raw association of concepts in terms

of “synchronic” regularities, and the causal ordering of these categories as “diachronic” regu-

larities. Computational researchers must first synchronically develop categories using similar-

ity analysis (e.g., clustering) and associations between these categories (e.g. covariances, see

Holland et al., 1986). Computational categorization is similar to grouping categories in manu-

al grounded theory analysis. Computational association through rule learning and identifica-

tion of correlations is similar to the identification of associations in manual grounded theory

analysis. That is, the grammar provided by a pre-theoretic lexicon is used in order to catego-

rize incidents or to identify associations. In cluster analysis, for instance, relevant criteria

must be known in advance, and these criteria are taken from a pre-theoretic lexicon. However,

existing inductive approaches to extracting understanding from trace data, such as the tech-

niques for data mining (Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy, 1996), or predic-

tive analytics (Shmueli & Koppius 2011), for example, often (in practice) take a single pass at

extracting the sample, and then the analysis essentially stops with identified patterns, and thus

does not move beyond synchronic analysis. This understanding of correlations among con-

cepts may be considered a form of theory (e.g., see Shmueli and Koppius, 2011), but more

often, predictive models are thought largely to represent theoretic correlations in desperate

need of explanatory theory (Agarwal and Dhar, 2014). To develop theory, computational re-

searchers must bring causal understanding about predictors and effectors—both generated

from patterns, but then also tested and refined across the data. This theory generation requires

diachronic, temporal analyses (Holland et al., 1986). However, diachronic regularities do not

arise on their own. As a matter of fact, blunt, raw pattern identification often results in com-

putational “mud” that contributes nothing to our understanding of phenomena (Holland et al.,

22

1986, p. 323). To arrive at useful diachronic relations, and similar to manual grounded theory

analysis, researchers need to make sense of patterns, thereby drawing on schemas (Grolemund

and Wickham, 2014) or mental models (Holland et al., 1986) required for the sensemaking

process,7 which are drawn from the shared lexicons of a field. These mental models furnish

the fundamental concepts rooted in existing knowledge, upon which new knowledge can be

built: ”The discovery of scientific law is almost always an extremely ill-defined problem, in-

soluble by general heuristics without the direction of a mental model of the problem domain”

(Holland et al., 1986, p. 325).

To become theory, computationally identified patterns (a form of pre-theoretic under-

standing) need to be iteratively related to the lexicon of a field in order to move the repertoire

of explanations accepted in field forward, and thus generate relevant theory. Often this pro-

cess takes place implicitly, as scientists identify patterns, and only then look for the theoreti-

cal lens that accounts for these patterns—and write it up as if they were testing hypotheses

(Locke, 2007). Further, until these patterns are identified, it is often unclear which additional

data may be required for this theorizing. Therefore, rather than just a one-way process of data

extraction to analysis for correlation to pattern identification, for the generation of theory,

computational researchers can find guidance in the GTM approach. A grounded approach to

theory generation involves repeated sampling of a population iterating with rounds of correla-

tion analysis to identify patterns. Theoretical concepts and categories can “earn their way”

into the explanation over time (Strauss and Corbin, 1998)—no pretense of hypothesis testing

is necessary; essentially an iterative “theoretical” extraction, informed by patterns identified.

Researchers make sense of these patterns in relation to an existing lexical framework in order

to generate generalizable theory (see the right side of Figure 4).

7 Holland and associates’ (1986) use of a mental model draws on the psychological concept from Johnson-Laird

1980. Since the authors are cognitive scientists and psychologists this makes sense. But in accordance with Ha-

bermas’s rational reconstruction, these mental models find their roots in the shared lexicons of the relevant sci-

entific fields of practice. One might think of the lexicon as the mental model of a scientific field.

23

To summarize, the primary goal of grounded theory development using computational

methods is to build or extend a lexicon in a substantive or even formal area of investigation

while, at the same time, drawing upon an existing pre-theoretic lexicon within a scientific

community to make sense of the data.

A Method of Grounded Analysis

Together, we can now propose an approach for grounded analysis, both manual and

computational, focusing on the role of the lexicon in enabling the generation of theory from

patterns identified in the data. Table 2 summarizes the main activities for both manual and

computational grounded analysis.

Table 2: Iterative activities in manual and computational grounded analysis

Activity Goal Manual Analysis Computational Analysis

Sampling Develop Dataset Theoretical sampling (e.g.,

collect additional data based

on findings and analysis)

Data extraction (e.g., map reduce

a large amount of trace data to

manageable volume based on

goals and current knowledge)

Synchronic

Analysis

Develop Categories

and Associations

Coding

- Identification of similarities:

open coding

- Identification of correla-

tions: axial coding

Synchronic relations:

- Categorization—relationships

among similar data

- Associations—relationships of

covariance among data

Reference

Lexicon

Draw upon Gram-

mar

In both cases, the lexicon provides the pre-theoretic reference for

the identification of patterns in relation to a goal, using the lan-

guage and causal relations determined by one or more scholarly

communities.

Diachronic

Analysis

Generate Theory In both cases, the generation of theory requires a sensemaking

process; that is, the analyst decides—based on the empirical evi-

dence—what concepts and relationships (pre-theoretic understand-

ing) are included in a coherent theoretical scheme (theoretic under-

standing).

This process might require an interaction between manual analysis

and computational analysis.

In what follows, we provide a more detailed description for each of the steps.

24

Sampling: At the outset, the analyst defines the area of investigation, thereby defining

the scope and boundary conditions of the intended theory development. Often this begins by

convenience—a dataset or study location is available; or by a phenomenon—some topic do-

main is “hot” at some point so researchers look to explore that domain. In early stages of re-

search, an initial sample is drawn. Data collection methods that are commonly used in tradi-

tional grounded theory include interviews or observation (Urquhart, 2013) and the process is

that of theoretical sampling, where expected theoretical insights guide what data is collected

next (Glaser and Strauss, 1967). This notion of theoretical sampling is one of the key ele-

ments of GTM: the researcher continues to develop the sample throughout the analysis pro-

cess. Additional sampling is typical as the researcher begins to gain insight from early sam-

pling. In computational analysis, the process is one of extraction that typically includes tasks

such as data cleaning (Fayyad et al., 1996) for structured data and map-reduce as a first step

in extracting unstructured data (Chen and Schlosser, 2008). From a grounded theory perspec-

tive, additional rounds of sampling would likely follow based on analysis of early samples.

Naturally, the sample for manual analysis will be smaller than for computational analysis in

most cases. Still, it is important to note that both in the computational and manual analysis of

data it is a sample that is analyzed.

Synchronic Analysis: After they draw each sample, researchers begin exploring the

data. In manual grounded theory this involves coding. In the earlier stages of manual ground-

ed theory, the researcher aims to identify first categories based on the similarities between

incidents as well as first co-occurrences of categories. In open coding (Strauss and Corbin,

1998), for instance, the analyst identifies categories by grouping similar incidents found in the

data under the same label. In axial coding (Strauss and Corbin, 1998) the analyst looks for

25

other categories (sub-categories) that co-occur with this category.8 That is, the analyst looks

for both similarities (grouping) and correlations (co-occurrence of categories and their subcat-

egories), thereby developing a pre-theoretic understanding.

In the computational analysis of trace data, both processes (identifying categories and

identifying associations) are typically referred to in terms of identifying synchronic relations.

The first is relations of category (poodles and bulldogs are “dogs”) and relations of associa-

tion (“dogs” and “bones” go together, see Holland et al., 1986). A computational analyst can

choose among different exploratory techniques, for instance, cluster analysis in order to iden-

tify similarities (Anderberg, 1973) or association rule mining to explore different relation-

ships (Hipp, Güntzer, and Nakhaeizadeh, 2000). However, in order to turn these general asso-

ciations into theory, these categories and associations must be related to a particular lexicon

established in the field.

Reference Lexicon: Iterating with the coding and continued sampling (as necessary),

the analyst settles upon the lexical frameworks to be used to analyze the data—that is, the pre-

theoretic lexicon providing an appropriate grammar to analyze the data. In the analysis pro-

cess, the researcher may consider different pre-theoretic lexicons throughout the process, and

the lexicon-in-use may change. Since grounded theory is exploratory in nature, and no specif-

ic concepts will be chosen to be imposed on the data, the pre-theoretic-lexicon needs to be

sufficiently abstract to allow for emergence. The analyst might, for instance, select certain

theoretical codes (that indeed constitute a pre-theoretic grammar) in order to enhance her the-

oretical sensitivity such as Glaser’s coding families (Glaser, 1978; Glaser, 2005). Examples

from this pre-theoretic lexicon include the 6Cs (causes, contexts, contingencies, consequenc-

es, covariances, and conditions) or the interactive family (mutual effects, reciprocity, mutual

trajectory, mutual dependency, interdependence, interaction of effects, covariance). An alter-

8 Please note that in axial coding the analyst also starts to identify whether the categories are indeed conditions

or consequences, and the lines between synchronic and diachronic analysis blur.

26

native is the paradigm model proposed by Strauss (1987) and later Strauss and Corbin (1990,

1998) and Corbin and Strauss (2008) that distinguishes between context, phenomenon, ac-

tion/interaction, strategies, and consequences and is frequently used in IS research (e.g.,

O'Reilly and Finnegan, 2010; Strong and Volkoff, 2010). In the case of computational analy-

sis, the analyst also chooses a lexicon, for instance, a classification scheme or dictionary used

in sentiment analysis (Pang and Lee, 2008) or relevant criteria in cluster analysis (Anderberg,

1973).

Diachronic Analysis: Using a pre-theoretic lexicon, the pre-theoretic understanding

yielded through synchronic analysis provides the logical point of departure to develop theory.

In selective coding (Strauss and Corbin, 1998), for instance, the analyst seeks to identify a

coherent theoretical scheme, thus identifying the correlations that matter in order to explain

the phenomenon under investigation. The resultant theoretical scheme typically provides a

sense of understanding and, if possible, causal relationships as important features of theory

(Reynolds, 1971). In computational analysis, diachronic relations are uncovered through in-

teraction with the goals and explanations drawn from and extending the field of research

(Holland et al., 1986). Both computational and manual grounded theory analysis require a

process of sensemaking as, ultimately, data analysis is a cognitive human process requiring

sensemaking (Grolemund and Wickham, 2014).

It is important to note that this is an iterative process and should not be thought of as

linearly. All aspects of the theory generation emerge over time—from the sample and the pat-

terns drawn from the sample to the lexicons brought to bear and the emerging theory itself.

During these cycles of theory generation there are three areas important to the theorizing: the

level of induction, the level of generalization, and the level of lexicon.

Next, we will illustrate the proposed grounded approach using studies published in the

field of IS.

27

Illustration of the Role of Lexicons in Building Theory

In IS research, a number of researchers have drawn upon existing lexicons to generate and

extend theory, for instance, using network data. To illustrate the application of the proposed

framework and major steps for grounded theory development using both manual and compu-

tational analysis, we turn to two example studies from the pages of Information Systems Re-

search.

Both are suitable cases in order to illustrate the pivotal role of pre-theoretic lexicons in

building theory from different types of data (both qualitative and quantitative), using different

methods (both manual and computational). These studies are:

Bampo et al (2008), which analyzes a marketing campaign to make theoretical asser-

tions about how scale-free networks increase the likelihood of viral success.

Chen et al (2011), which analyzes news coverage and contact racing data from pa-

tients associated with the severe acute respiratory syndrome (SARS) to develop a the-

oretical framework for outbreak management.

Error! Reference source not found. provides an overview.

Table 1: Overview of illustrative application

Bampo et al. (2008) Chen et al. (2011)

Data collection

Sampling Communication links activated when

messages where transmitted through a

digital network

Two stage sampling:

Empirical data from a viral mar-

keting campaign with 39,000 “in-

fectives” for the first generation of

the viral marketing campaign

Data generated through a simula-

tion model based on the analysis of

the empirical data collected in the

previous stage

Publicly available documents on a

SARS outbreak in Taiwan, interviews

with SARS patients

SARS related hospital admission history

was added for geographical contacts

Synchronic analysis Rounds of modeling and analysis,

including simulation using alternative

network models

Identification of varying growth and

reach patterns

Based on the analysis of data, two cate-

gories were identified: personal contact

and geographical contact

Manual coding

Multivariate time-series modelling

Identification of dissemination patterns

of SARS outbreak

28

Reference lexicon Social network theory (e.g., include

nodes, edges, distance, connectedness)

Theory of loose coupling (e.g., central

control, decoupling)

Diachronic analysis Theorizing around “seeding” influen-

tial nodes in the network and the im-

pact of this seeding on viral marketing

campaigns

Developing a process theory about dis-

ease control and healthcare governance,

based on the synchronic view of loose

coupling

Next we will describe each study.

The Effects of the Social Structure of Digital Networks on Viral Marketing Performance

(Bampo et al 2008)

Bampo et al (2008) used the lexicon of social network analysis to analyze a General

Motors marketing campaign that went viral in Australia. That is, the concepts and relation-

ships provided by social network theory served as a pre-theoretic lexicon that allowed the

researchers to make sense of the data. The analyzed data set contained the communication

links activated when messages where transmitted through a digital network, with a number of

approximately 39,000 ‘infectives’ for the first generation of the viral marketing campaign.

Elements of the pre-theoretic lexicon used for the study include nodes, edges, distance, con-

nectedness, and other constructs common to this tradition. Applying the pre-theoretic lexicon

to the empirical data (and rounds of modeling and analysis, including simulation using alter-

native network models) in processes of synchronic analyses allowed the authors to generate

pre-theoretic understanding in terms of varying growth and reach patterns (i.e., synchronic

regularities) for viral marketing campaigns in light of the underlying social network struc-

tures. Based on this procedure and resultant pre-theoretic understanding, the authors make

theoretical assertions about how scale-free networks increase the likelihood of viral success,

and they theorize around “seeding” influential nodes in the network and the impact of this

seeding on viral marketing campaigns (i.e., diachronic regularities). Quantitative data and

computational methods are thus used in a theory building process, where a pre-theoretic lexi-

con allowed for developing a pre-theoretic understanding and subsequently the generation of

29

novel theory. We can observe that the major components of the above framework (pre-

theoretic lexicon, sample, pre-theoretic understanding, theory), as well as basic steps of the

proposed grounded method (data collection, sampling, synchronic analysis, reference lexicon,

diachronic analysis), are present.

Managing Emerging Infectious Diseases with Information Systems (Chen et al. 2011)

Chen et al. (2011) draw upon a lexicon from the theory of loose coupling to analyze

data from a SARS outbreak in Taiwan. Data collection included publicly available documents

as well as interviews with SARS patients. The authors use different qualitative (manual cod-

ing in order to identify similarities among contact incidents and thus contexts of infection)

and quantitative techniques (multivariate time-series modelling in order to explore the chang-

es of SARS dissemination patterns under the contexts, including the interactions among the

different contexts of infection) to analyze the data. Drawing on concepts such as central con-

trol and decoupling from the pre-theoretic lexicon, pre-theoretic understanding in terms of

dissemination patterns of SARS outbreak (i.e., synchronic regularities) is thus generated. Sub-

sequently, the authors develop the pre-theoretic understanding into a process theory (i.e., dia-

chronic regularities), extending their view of loose coupling to develop a process theory about

disease control and healthcare governance. We can thus again observe the major components

of the above framework as well as the steps of the proposed method.

To summarize, there are indeed examples where IS researchers have used pre-

theoretic lexicons in order to proceed to the level of theory via pre-theoretic understanding,

using different qualitative and quantitative methods. These lexicons provided pre-theoretic

understanding in relation to the phenomena that were analyzed. However, examples where IS

researchers use large amounts of data—such as in the first example—are still scarce. Through

the two examples, we can illustrate the applicability of the framework we suggested, both in

terms of general components and methodological steps and choices.

30

Discussion and Implications

Everyone who publishes in professional journals in the social sciences knows that you are supposed to

start your article with a theory, then make deductions from it, then test it, and then revise the theory. At

least that is the policy that journal editors and textbooks routinely support. In practice, however, I be-

lieve that this policy encourages—in fact demands, premature theorizing and often leads to making up

hypotheses after the fact—which is contrary to the intent of hypothetico–deductive method. (Locke,

2007, p. 867)

In Edwin Locke’s quote, he points out how the policy of presenting research in terms

of constructing hypotheses and testing them often goes against the real work of theory genera-

tion and empirical analysis. Quite frequently findings are developed inductively, and then

researchers disingenuously reconstruct a hypothetico-deductive paper out of the patterns they

found (Anonymous, 2015). This is because there is a stigma associated with “fishing” in data

for patterns that may simply be spurious correlations without theoretical explanations. In the

age of computational social science and trace data, there should be a mechanism for this pre-

tense to come to an end. How can researchers inductively generate theory from patterns they

see in data, without feeling the need to repackage their research in terms of hypothesis test-

ing? The answer lies in an explicit epistemological foundation and associated methodological

approach for analyzing trace data for the purpose of generating theory.

For decades qualitative researchers have understood that rigorous attention to empiri-

cal data via cycles of sensemaking can help generate novel theory. Using GTM, qualitative

researchers have a legitimizing tradition to draw upon when explaining patterns they see in

accordance with existing lexicons and proposing the resulting ideas in terms of theory genera-

tion. In this paper, we propose an approach for computational researchers dealing with trace

31

data to do the same, and we describe the epistemological foundations grounded in Habermas’

rational reconstruction.

We join with those calling for a broader “grounded paradigm” for theory development

based on the key features of grounded theory (Walsh et al 2015). The epistemological posi-

tion relating the pre-theoretic patterns of correlation and the pre-theoretic lexicon appropriat-

ed by the analyst is pivotal to this paradigm. While we are primarily interested in the genera-

tion of new lexicons and the extension of existent lexicons through grounded analysis, these

activities typically require the application of an existent, pre-theoretic lexicon. What specific

lexicon an analyst uses is influenced by (a) the scientific community the analyst is part of and

(b) deliberative choices made by the analyst (e.g., what theories or frameworks should be best

used to make sense of the data).

It has been recently stated that information mining and traditional theory building are

indeed complementary, interrelated methods (Dhar, 2013; Gopal, Marsden, and Vanthienen,

2011). The pivotal role of lexicons in theory building provides the keystone for explaining

this relationship. In order to make sense of patterns identified through computational meth-

ods, and to form appropriate mental models that can be used in the sensemaking process

(Holland et al., 1986), the analyst requires a lexicon that is shared by a community of scholars

(Habermas, 2003). This lexicon can be taken from existent theoretic lexicons, such as social

network theory that then serve as pre-theoretic lexicons in the process of novel theorizing.

The patterns generated through computational analysis constitute pre-theoretic understanding

in form of synchronic regularities that can serve as a foundation for the development of novel

theory. Our framework can thus be seen as an answer to the call made by Gopal et al. (2011),

who suggest that “researchers may develop an iterative approach that uses information mining

outcomes as inputs into the theory construction and validation processes” (p. 370).

32

Glaser and Strauss led a revolution of sorts in social analysis. Through a program of

intense attention to empirical data, they legitimized a way to generate novel theory that could

revitalize a stale discourse. Some argue that organizational and IS literature may be stagnating

(Davison, 2010) or not reaching their potential (Grover, 2013). Now, particularly given the

opportunity that the data explosion provides, it is not the time to close down methods for the-

ory generation that are grounded in empirical data. At the same time, it is important to capital-

ize on the maturity of GTM, and to encourage further methodological attention in this regard

in order to get the most out of on the new opportunities proffered by the availability of data

with high volume, velocity, and variety (McAfee and Brynjolfsson, 2012).

The framework we proposed thus explicitly accounts for different modes of grounded

theory development, both manual and computational, and suggests that GTM as described in

seminal works (e.g., Charmaz, 2006; Glaser and Strauss, 1967; Strauss and Corbin, 1990) is

but one way to so inductively develop theory. The framework has some important implica-

tions for IS research. While GTM has been alleged to be used as a toolbox for coding data in

IS research, instead of developing theory (Matavire and Brown, 2011; Urquhart et al., 2010),

we contend that the opposite should be the case. Not only should we use GTM in order to

build theory, but as a discipline we should be open to rigorous approaches to developing theo-

ry that can be creative and adaptive, based on different types and volumes of data, using dif-

ferent manual and computational methods, and where the common denominator is that of

emergence (see also Birks, Fernandez, Levina and Nasrin 2013). The more recent develop-

ments in computational social science (Lazer et al., 2009) promise an unprecedented access to

different forms of data, thereby calling for alternative inductive approaches that move beyond

the manual consideration of qualitative data only. After all, the trace data available to IS

scholars is rife with opportunity to rethink phenomena in fundamentally different ways, driv-

en by intensive empiricism (Latour, 2010). If one were to compare this opportunity in social

33

science to physics, “it is as if every physicist had a supercollider dropped into his or her back-

yard” (Davis, 2010, p. 696).

The IS field is particularly well-positioned to lead this revolution in social research

(Agarwal and Dhar, 2014). First, as a discipline, we investigate those phenomena that have

made the trace data revolution possible—the phenomena that make computational social sci-

ence (Lazer et al., 2009) possible in the first place. Second, our discipline is devoted to inves-

tigating complex socio-technical settings that require us to make sense of large amounts of

data that pertain to the interaction of ‘the social’ and ‘the technical’ (Orlikowski, 2007).

Third, there is a very real need to develop novel and accurate theory grounded in large

amounts of data instead of “working” existing theories, as we are challenged to further devel-

op our intellectual core (Webster and Watson, 2002).

Conclusion

In this research commentary, we inquired how the lessons learned from GTM can be

used to build theory from trace data, thereby taking a first step towards a more general

grounded paradigm accommodating for both manual and computational analysis of data. Spe-

cifically, drawing on Habermas’ rational reconstruction, we highlighted the importance of a

lexicon in this process.

We have argued that we need to focus on the all-important metaphor of emergence,

and understand how we can (a) use qualitative, quantitative, and computational data and (b)

consider prior theory in our attempts of building and justifying theory. Based on three key

concepts in sociological research (level of induction, level of generalization, and lexicon), we

have proposed four major modes of research that either generate or explain lexicons as well as

described the major steps and the role of the lexicon therein, and have described the grounded

theory process of theorizing from trace data. The framework provides an epistemological ba-

sis for developing theory from trace data with computational techniques and is intended to

34

provide an epistemological basis to inform theory building efforts in the IS discipline. It is our

contention that such basis might draw more explicit attention to, and acceptance for, explora-

tion and theory building from all sorts of data, instead of repackaging research in terms of

hypothesis testing.

References

Agarwal, R., and Dhar, V. (2014). Editorial—Big Data, Data Science, and Analytics: The Opportunity and

Challenge for Is Research. INFORMATION SYSTEMS RESEARCH, 25(3), 443-448.

Anderberg, M. R. (1973). Cluster Analysis for Applications: DTIC Document.

Anonymous. (2015). The Case of the Hypothesis That Never Was; Uncovering the Deceptive Use of Post Hoc

Hypotheses. Journal of Management Inquiry, 1056492614567042.

Bacharach, S. B. (1989). Organizational Theories: Some Criteria for Evaluation. Academy of Management

Review, 14(4), 496-515.

Bampo, M., Ewing, M. T., Mather, D. R., Stewart, D., and Wallace, M. (2008). The Effects of the Social

Structure of Digital Networks on Viral Marketing Performance. Information Systems Research, 19(3),

273-290. doi: 10.1287/isre.1070.0152

Birks, DF, Fernandez, W., Levina, N., and Nasrin, S. 2013. "Grounded Theory Method in Information Systems

Research: Its Nature, Diversity and Opportunities," European Journal of Information Systems (22:1),

pp. 1-8.

Bryant, A. (2002). Re-Grounding Grounded Theory. Journal of Information Technology Theory and

Application, 4, 25-42.

Bryant, A., and Charmaz, K. (2007). Grounded Theory Research: Methods and Practices. In A. Bryant and K.

Charmaz (Eds.), The Sage Handbook of Grounded Theory (pp. 1-28). London, UK: Sage.

Charmaz, K. (2000). Grounded Theory: Objectivist and Constructivist Methods. In N. K. Denzin and Y. S.

Lincoln (Eds.), Handbook of Qualitative Research (2nd ed., pp. 509–535). Thousand Oaks, CA: Sage.

Charmaz, K. (2006). Constructing Grounded Theory: A Practical Guide through Qualitative Analysis. Thousand

Oaks, CA: Sage.

Chen, S., and Schlosser, S. W. (2008). Map-Reduce Meets Wider Varieties of Applications Intel Research

Pittsburgh, Tech. Rep. IRP-TR-08-05.

Chen, Y.-D., Brown, S. A., Hu, P. J.-H., King, C.-C., and Chen, H. (2011). Managing Emerging Infectious

Diseases with Information Systems: Reconceptualizing Outbreak Management through the Lens of

Loose Coupling. INFORMATION SYSTEMS RESEARCH, 22(3), 447-468. doi: 10.1287/isre.1110.0376

Ciborra, C. U. (1998). Crisis and Foundations: An Inquiry into the Nature and Limits of Models and Methods in

the Information Systems Discipline. The Journal of Strategic Information Systems, 7(1), 5-16. doi: Doi:

10.1016/s0963-8687(98)00020-1

Contractor, N. S., Monge, P. R., and Leonardi, P. M. (2011). Multidimensional Networks and the Dynamics of

Sociomateriality: Bringing Technology inside the Network. International Journal of Communication, 5,

682-720.

Corbin, J., and Strauss, A. L. (2008). Basics of Qualitative Research. Thousand Oaks, CA: Sage.

Davis, G. F. (2010). Do Theories of Organizations Progress? Organizational Research Methods.

Davison, R. M. (2010). Retrospect and Prospect: Information Systems in the Last and Next 25 Years: Response

and Extension. Journal of Information Technology, 25(4), 352-354.

Dhar, V. (2013). Data Science and Prediction. Communications of the ACM, 56(12), 64-73.

DiMaggio, P. J. (1995). Comments on" What Theory Is Not". Administrative Science Quarterly, 391-397.

Eisenhardt, K. M. (1989). Building Theories from Case Study Research. Academy of Management Review,

14(4), 532-550.

Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R. (1996). Advances in Knowledge

Discovery and Data Mining.

35

Gaskin, J., Berente, N., Lyytinen, K., and Yoo, Y. (2014). Toward Generalizable Sociomaterial Inquiry: A

Computational Approach for ‘Zooming in & out’ of Sociomaterial Routines. MIS Quarterly,

forthcoming.

Glaser, B. G. (1978). Theoretical Sensitivity: Advances in the Methodology of Grounded Theory. Mill Valley,

CA: The Sociology Press.

Glaser, B. G. (2008). Doing Quantitative Grounded Theory: Sociology Press.

Glaser, B. G., and Strauss, A. L. (1967). The Discovery of Grounded Theory: Strategies for Qualitative

Research. Chicago, IL: Aldine Publishing Company.

Glaser, B. J. (2005). The Grounded Theory Perspective Iii: Theoretical Coding. Mill Valley:CA: Sociology

Press.

Gopal, R., Marsden, J. R., and Vanthienen, J. (2011). Information Mining—Reflections on Recent

Advancements and the Road Ahead in Data, Text, and Media Mining. Decision Support Systems, 51(4),

727-731.

Grewal, R., Lilien, G. L., and Mallapragada, G. (2006). Location, Location, Location: How Network

Embeddedness Affects Project Success in Open Source Systems. Management Science, 52(7), 1043-

1056.

Grolemund, G., and Wickham, H. (2014). A Cognitive Interpretation of Data Analysis. International Statistical

Review.

Grover, V. (2013). Muddling Along to Moving Beyond in Is Research: Getting from Good to Great. [Article].

Journal of the Association for Information Systems, 14, 274-282.

Guba, E. G., and Lincoln, Y. S. (1994). Competing Paradigms in Qualitative Research Handbook of Qualitative

Research (Vol. 2, pp. 163-194).

Habermas, J. (1983). Interpretive Social Science Vs. Hermeneuticism. Social science as moral inquiry, 251-269.

Habermas, J. (2003). Truth and Justification. Cambridge, MA: MIT Press.

Handfield, R., and Melnyk, S. A. (1998). The Scientific Theory-Building Process: A Primer Using the Case of

Tqm. Journal of Operations Management, 16(4), 321-339.

Hipp, J., Güntzer, U., and Nakhaeizadeh, G. (2000). Algorithms for Association Rule Mining—a General Survey

and Comparison. ACM sigkdd explorations newsletter, 2(1), 58-64.

Holland, J. H., Holyoak, K. J., Nisbett, R. E., and Thagard, P. R. (1986). Induction: Processes of Inference,

Learning, and Discovery: MIT Press, Cambridge, MA.

Hood, J. C. (2007). Orthodoxy Vs. Power: The Defining Traits of Grounded Theory. The Sage handbook of

grounded theory, 151-164.

Jones, M. R., and Karsten, H. (2008). Giddens's Structuration Theory and Information Systems Research. MIS

Quarterly, 32(1), 127-157.

Jones, R., and Noble, G. (2007). Grounded Theory and Management Research: A Lack of Integrity? Qualitative

Research in Organizations and Management: An International Journal, 2(2), 84-103.

Kearney, M. H. (2007). From the Sublime to the Meticulous: The Continuing Evolution of Grounded Formal

Theory. In A. Bryant and K. Charmaz (Eds.), The Sage Handook of Grounded Theory (pp. 127-150).

London, UK: Sage.

Kelle, U. (2007a). The Development of Categories: Different Approaches in Grounded Theory. In A. Bryant and

K. Charmaz (Eds.), The Sage Handbook of Grounded Theory (pp. 191-213). London, UK: Sage.

Kelle, U. (2007b). " Emergence" Vs." Forcing" of Empirical Data? A Crucial Problem of" Grounded Theory"

Reconsidered. Historical Social Research/Historische Sozialforschung. Supplement, 133-156.

Latour, B. (2005). Reassembling the Social-an Introduction to Actor-Network-Theory. Reassembling the Social-

An Introduction to Actor-Network-Theory, by Bruno Latour, pp. 316. Foreword by Bruno Latour.

Oxford University Press, Sep 2005. ISBN-10: 0199256047. ISBN-13: 9780199256044, 1.

Latour, B. (2010). Tarde’s Idea of Quantification. In M. Candea (Ed.), The Social after Gabriel Tarde: Debates

and Assessments: Routledge.

Lazer, D., Pentland, A. S., Adamic, L., Aral, S., Barabasi, A. L., Brewer, D., . . . Gutmann, M. (2009). Life in

the Network: The Coming Age of Computational Social Science. Science, 323(5915), 721.

Legewie, H., and Schervier-Legewie, B. (2004). Research Is Hard Work, It's Always a Bit Suffering. Therefore

on the Other Side It Should Be Fun.Anselm Strauss in Conversation with Heiner Legewie and Barbara

Schervier-Legewie. . Forum Qualitative Sozialforschung/Forum: Qualitative Social Research, 5(3).

Lempert, L. B. (2007). Asking Questions of the Data: Memo Writing in the Grounded Theory Tradition. The

Sage handbook of grounded theory, 245-264.

Locke, E. A. (2007). The Case for Inductive Theory Building†. Journal of Management, 33(6), 867-890.

Matavire, R., and Brown, I. (2011). Profiling Grounded Theory Approaches in Information Systems Research.

European Journal of Information Systems, 22(1), 119-129.

36

McAfee, A., and Brynjolfsson, E. (2012). Big Data: The Management Revolution. Harvard Business

Review(October 2012), 60-68.

Merton, R. K. (1957). Priorities in Scientific Discovery: A Chapter in the Sociology of Science. American

sociological review, 22(6), 635-659.

Morse, J. (2007). Sampling in Grounded Theory. The Sage handbook of grounded theory, 229-244.

O'Reilly, P., and Finnegan, P. (2010). Intermediaries in Inter-Organisational Networks: Building a Theory of

Electronic Marketplace Performance. European Journal of Information Systems, 19(4), 462-480.

Orlikowski, W. J. (1992). The Duality of Technology: Rethinking the Concept of Technology in Organizations.

Organization Science, 3(3), 398-427.

Orlikowski, W. J. (2007). Sociomaterial Practices: Exploring Technology at Work. Organization Studies, 28(9),

1435-1448.

Pang, B., and Lee, L. (2008). Opinion Mining and Sentiment Analysis. Foundations and trends in information

retrieval, 2(1-2), 1-135.

Pedersen, J. (2008). Habermas' Method: Rational Reconstruction. Philosophy of the Social Sciences.

Peirce, C. S. (1992). Reasoning and the Logic of Things: The Cambridge Conferences Lectures of 1898: Harvard

University Press.

Reichertz, J. (2007). Abduction: The Logic of Discovery of Grounded Theory: Sage.

Reynolds, P. D. (1971). A Primer in Theory Construction. Needham Heights, MA: Allyn and Bacon.

Seidel, S., and Urquhart, C. (2013). On Emergence and Forcing in Information Systems Grounded Theory

Studies: The Case of Strauss and Corbin. Journal of Information Technology, 28(3), 237-260.

Shmueli, G., and Koppius, O. R. (2011). Predictive Analytics in Information Systems Research. MIS Quarterly,

35(3), 553-572.

Strauss, A. L. (1987). Qualitative Analysis for Social Scientists. Cambridge, UK: University of Cambridge Press.

Strauss, A. L., and Corbin, J. (1990). Basics of Qualitative Research (1st edition ed.). Thousand Oaks, CA: Sage.

Strauss, A. L., and Corbin, J. (1998). Basics of Qualitative Research. Techniques and Procedures for Developing

Grounded Theory (2nd ed.). London, UK: Sage.

Strong, D. M., and Volkoff, O. (2010). Understanding Organization-Enterprise System Fit: A Path to Theorizing

the Information Technology Artifact. MIS Quarterly, 34(4), 731-756.

Strübing, J. (2007). Research as Pragmatic Problem-Solving: The Pragmatist Roots of Empirically-Grounded

Theorizing The Sage Handbook of Grounded Theory (pp. 580-601).

Suddaby, R. (2006). From the Editors: What Grounded Theory Is Not. Academy of management journal, 49(4),

633-642.

Sutton, R. I., and Staw, B. M. (1995). What Theory Is Not. Administrative Science Quarterly, 371-384.

Urquhart, C. (2013). Grounded Theory for Qualitative Research: A Practical Guide. London, UK: Sage.

Urquhart, C., and Fernández, W. (2013). Using Grounded Theory Method in Information Systems: The

Researcher as Blank Slate and Other Myths. Journal of Information Technology, 28(3), 224-236.

Urquhart, C., Lehmann, H., and Myers, M. D. (2010). Putting the ‘Theory’ Back into Grounded Theory:

Guidelines for Grounded Theory Studies in Information Systems. Information Systems Journal, 20(4),

357-381.

Van Maanen, J. (1979). The Fact of Fiction in Organizational Ethnography. Administrative Science Quarterly,

24(4), 539-550.

Walsh, I., Holton, J. A., Bailyn, L., Fernandez, W., Levina, N., & Glaser, B. (2015). What Grounded Theory

Is… A Critically Reflective Conversation Among Scholars. Organizational Research Methods,

1094428114565028.

Webster, J., and Watson, R. T. (2002). Analyzing the Past to Prepare for the Future: Writing a Literature

Review. MIS Quarterly, 26(2), xiii-xxiii.

Weick, K. E. (1995). What Theory Is Not, Theorizing Is. Administrative Science Quarterly, 40(3), 385-390.