Download pdf - Reader 36 WS 01 Rbae

Transcript
Page 1: Reader 36 WS 01 Rbae

Competence Center Corporate Data Quality

36th

Workshop

READER

St.Gallen, 10th & 11

th October 2013

IWI-HSG Institute of Information Management University of St. Gallen Müller-Friedberg-Strasse 8 CH-9000 St. Gallen

Page 2: Reader 36 WS 01 Rbae

Contents

Otto, B., Aier, S.: Business Models in the Data Economy: A Case Study from the Business Partner Data Domain, in: 11th International Conference on Wirtschaftsinformatik, 27th February – 01st March 2013, Leipzig, Germany. Wlodarczyk, T.W., Rong, C., Thorsen, K.A.H.: Industrial Cloud: Toward Inter-enterprise Integration., in M.G. Jatuun, G. Zhao, C. Rong (Editors), CloudCom 2009 (p. 460 - 471). Berlin, Heidelberg: Springer. Loshin, D.: Developing a Business Case and a Data Quality Road Map., in D. Loshin, A Practitioner’s Guide to Data Quality Improvement (p. 67 - 90). Burlington: Morgan Kaufmann, 2011.

Page 3: Reader 36 WS 01 Rbae

475 11th International Conference on Wirtschaftsinformatik, 27th February – 01st March 2013, Leipzig, Germany

Business Models in the Data Economy: A Case Study from the Business Partner Data Domain

Boris Otto and Stephan Aier

University of St. Gallen, Institute of Information Management, St. Gallen, Switzerland {boris.otto,stephan.aier}@unisg.ch

Abstract. Data management seems to experience a renaissance today. One par-ticular trend in the so-called data economy has been the emergence of business models based on the provision of high-quality data. In this context, the paper examines business models of business partner data providers. The paper ex-plores as to how and why these business models differ. Based on a study of six cases, the paper identifies three different business model patterns. A resource-based view is taken to explore the details of these patterns. Furthermore, the pa-per develops a set of propositions that help understand why the different busi-ness models evolved and how they may develop in the future. Finally, the paper discusses the ongoing market transformation process indicating a shift from tra-ditional value chains toward value networks—a change which, if it is sustaina-ble, would seriously threaten the business models of well-established data pro-viders, such as Dun & Bradstreet, for example.

Keywords: Business model, Case study, Data quality, Data resource manage-ment, Resource-based view

1 Introduction

Recent society, economic, and technological developments, such as management and exploitation of large data volumes (“big data”), increasing business relevance of con-sumer data due to the upcoming of social networks, and the growing attention topics like data quality have received lately, seem to have triggered a renaissance of data management in enterprises. Analyst company Gartner has coined the notion of the “data economy” [1] in an attempt to introduce a single term subsuming these trends. The term implies to view data as an intangible good. Research has been examining the transfer of management concepts for physical goods to the domain of intangible goods (such as data) since the 1980s [2], [3]. In parallel, business models have emerged taking up on the idea of selling data of high quality.

Sourcing high-quality business partner data is of high relevance particularly for purchasing as well as for sales and marketing departments of large enterprises [4]. For example, reliable and valid business partner data (such as company names, company identifiers, or subsidiary company information) is a necessary prerequisite for doing cross-divisional spend analysis or for pooling purchasing volumes on a company-

Page 4: Reader 36 WS 01 Rbae

476

wide level. The demand for high-quality business partner data has fuelled the emer-gence of corresponding business models. A prominent example is Dun & Bradstreet (D&B).

While business partner data services have received attention in the practitioners’ community for quite some time, research has not taken up the issue to a significant extent so far (a notable exception is the work of Madnick et al. [4]). Nobody has come up with a comprehensive analysis of business models in the field of business partner data services to this day. The paper at hand addresses this gap in literature and aims at exploring business models in the business partner data domain. In particular, our re-search aims at investigating the question as to how and why business models of busi-ness partner data providers differ.

2 Theoretical Background

2.1 Data as an Economic Good

A clear, unambiguous and widely accepted understanding of the two terms data and information does not exist [5], [6]. One research strand sees information as knowledge exchanged during human communication, whereas another takes an in-formation processing lens according to which pieces of data are the building blocks of information [7]. The aim of the paper is not to take part in that discussion, but to fol-low one specific definition, which is to view information as data processed [2].

The value of data is determined by its quality [8]. Data quality is defined as a con-text dependent, multidimensional concept [9]. Context dependency means that quality requirements may vary depending on the specific situation data is used in. Multidi-mensionality refers to the fact that there is no single criterion by which data quality can be fully ascertained. Examples of data quality dimensions are accuracy, availabil-ity, consistency, completeness, or timeliness.

2.2 Business Partner Data

Business partner data typically comprises organization data (e.g. company names, addresses, and identifiers, but also industry classification codes), contact data (e.g. telephone numbers and e-mail addresses of companies), and banking information. Madnick et al. [4] have identified three challenges when it comes to managing busi-ness partner data in an organization. The first challenge, identical entity instance iden-tification, refers to the problem of identifying certain business partners, as in many cases an unambiguous, unique name or identification number is missing, and one and the same business partner is referred to by several synonyms across the organization. The second challenge, entity aggregation, relates to the problem of knowing about and identifying the parts and subsidiaries a certain business partner consists of. And the third challenge, transparency over inter-entity relationships, gets relevant if, for example, the overall revenue generated with a certain customer needs to be deter-mined, including direct sales but also third-party sales and reselling.

Page 5: Reader 36 WS 01 Rbae

477

2.3 Business Model Theory

A business model describes how an organization creates value [10], [11]. Business model research typically draws upon three paradigmatic perspectives on business strategy, namely the industrial organization perspective [12], the resource-based view [13], [14], and the strategy process perspective[15], [16]. The industrial organization perspective focuses on external forces that affect the work of managers. Substitute products, customers, suppliers, and competitors have an effect on strategic decisions, such as differentiation of products [17]. The resource-based view states that company specific sets of resources determine whether a company is able to achieve above-average performance [13], [14]. According to the resource-based view, characteristics of key resources of companies are value, rareness, inimitability, and non-substituta-bility (VRIN criteria) [14]. The strategy process perspective, finally, focuses on the managerial function [16].

In the mid-1990s, business models started to receive increasing attention in the sci-entific community as the first electronic business models emerged [18]. Research that time was mostly descriptive and analytical in nature. In general, when defining the term business model many authors referred to a set of concepts representing the un-derlying meta-model. Each concept can be instantiated differently in a specific busi-ness model. Typically these meta-model concepts were then combined with business model frameworks. More recently, the scientific community has started to provide guidance and support for designing business models. Osterwalder and Pigneur, for example, have proposed a handbook for “business model generation” [19].

Hedman and Kalling [20] have proposed a business model framework which is built on the three paradigmatic perspectives outlined above. Their business model framework consists of seven concepts, namely (1) customers, (2) competitors, (3) offering, (4) activities and organization, (5) resources, and (6) factor and production inputs. It also has a longitudinal process component to cover the dynamics of the business model over time, which is referred to as (7) scope of management.

3 Research Design

3.1 Overview

The paper aims at investigating business models of the business partner data domain. For this purpose, case study research was chosen as the underlying research method, as this form of research allows examining contemporary phenomena at an early stage of research in their real-world context [21-23]. The course of the research follows the five guiding points proposed by Yin [21], namely (i) research question, (ii) research propositions, (iii) unit of analysis, (iv) logic which links the data to the propositions, and (v) criteria for interpreting the findings.

As outlined in Section 1, the paper aims at investigating the (i) research question as to how and why business models in the business partner data domain differ. The case study explores a phenomenon which is still relatively unaddressed and for which only limited theoretical knowledge exists. Yin [21] concedes that in exploratory cases

Page 6: Reader 36 WS 01 Rbae

478

sound theoretical (ii) research propositions are hardly available. However, he stipu-lates to design a conceptual framework that guides the investigation. Section 3.2 de-scribes the conceptual framework used in the paper. A clear definition of the (iii) unit of analysis is important for determining the validity and generalizability of case study results, as it sets the boundaries of the scope of the analysis. In this paper, the unit of analysis is the domain of business models of business partner data providers. The conceptual framework also works as the (iv) logic which links the data to the proposi-tions. In fact, the conceptual framework forms a lens through which the individual cases can be studied and compared. Finally, (v) criteria for interpreting the findings are derived from the theoretical foundations of business model research, particularly by taking a resource-based view. The interpretation of findings results in propositions on design patterns for business models to be used in the business partner data domain.

3.2 Conceptual Framework

The paper’s main goal is not to advance business model theory in general, but to use existing business model research as a lens to study observable business models in a particular domain, namely business partner data services. In order to be able to sys-tematically describe and analyze the cases, the paper uses the business model frame-work proposed by Hedman and Kalling [20] (see Section 2.3) as a conceptual frame-work. This model was chosen because of two reasons. First, it is the result of a com-prehensive analysis of literature on business models. Second, it combines the three paradigmatic perspectives on business strategy. Hence, Hedman and Kalling’s busi-ness model framework is well suited to explore the research questions addressed in this paper.

3.3 Case Selection

The case study selection process consisted of two steps. The first step used a focus group to determine the most relevant business partner data providers from a practi-tioners’ perspective. In general, focus groups are an adequate research method for examining the level of consensus within a certain community [24]. The focus group got together on February 3, 2011, in Ittingen, Switzerland. Participants were 28 enter-prise data managers from large multinational organizations. They were presented an overview of business models of business partner data providers and were then asked (among other things) to identify on a list of 24 well-known data providers the four most relevant players. Criteria in the selection process referred to the conceptual framework and included, for example, the “offering” (availability of consulting ser-vices), “resources” (expertise in the domain), and the “scope of management” (global or regional). The participants chose Avox, BvD, D&B, and InfoGroup OneSource to be the four most important providers, so these four were selected to be included in the case study. In a second step, the list of four was extended by two more players, who had entered the market only shortly before, namely Factual and Infochimps. These two providers were chosen following the principle of theoretical replication [22], i.e. predicting contradictory results compared to the four pre-selected cases.

Page 7: Reader 36 WS 01 Rbae

479

3.4 Data Collection and Analysis

Data was collected from multiple sources. The beginning was made with publicly available information, such as annual reports, information provided on websites, etc. Furthermore, the companies were contacted via e-mail and telephone and were asked for more detailed information on their service offerings. Main contact persons includ-ed the head of Business Intelligence & Key Account Management at D&B in Switzer-land, a regional sales manager at BvD, and the Chief Operating Officer at Avox.

Data analysis used the conceptual framework presented in Section 3.2 as a theoret-ical lens to link the data to the different concepts of the business model framework. In the case of Avox, for example, the interview protocols, documents from the public domain (e.g. press releases and website information) as well as internal presentations on the Avox business model were analyzed according to Hedman and Kalling’s framework. Section 4 presents the results of the case analysis.

4 Business Models of Business Partner Data Providers

4.1 Business Models of the Case Study Companies

Avox is a provider of business partner data (i.e. names, addresses, chamber of com-merce numbers, ownership structures etc.) of legal entities companies do business with. Avox is specialized in business partner data relevant for the financial services industry. The data is stored in a central database which is fed by three main sources of data, namely (i) third-party data vendors (such as the Financial Times), (ii) companies providing information about themselves (such as annual reports, chamber of com-merce information, or website information), and (iii) customers providing updates. Thus, Avox customers do not only receive business partner data, they also contribute to the Avox database—typically on a weekly basis. Avox offers business partner data via three different services. Basic subsets of business data records are offered for free by wiki-data (i). Access to the Avox database for more comprehensive data is granted at a regular fee (ii). Customer specific services are offered at individually agreed pric-es (iii).

BvD is a provider of business partner data and related software solutions. BvD’s service portfolio is threefold. First, there is a database solution which basically offers access to the central database. Second, the company provides so-called “catalysts”—for specific needs of procurement or compliance departments, for example. Third, custom-made consulting services are offered for business partner data integration with customers’ enterprise systems, such as SAP or salesforce.com. BvD’s core activities comprise processing and combining of data from more than one hundred different sources, linking of this data, and extension of data through ownership and contact information from own research activities. The pricing model is based on both sub-scription and usage fees and also includes individual arrangements for customer-specific services.

D&B is operating a database of approximately 177 million business entity records from more than 240 countries. D&B maintains the nine-digit D-U-N-S number each

Page 8: Reader 36 WS 01 Rbae

480

organization in the database is assigned with. The D-U-N-S number is used by pur-chasing, sales, and marketing departments of customers for identifying, organizing, and consolidating information about business partners and for linking data about sup-pliers, customers, and trading partners. The D&B pricing model includes subscription and usage fees, licensing components, and customer-specific fees for services.

Factual provides open data to developers of web and mobile applications. The ser-vice was initially offered for free. After the initialization phase the service is now charged per data set, for example. Optionally, a flat rate can be booked. Large cus-tomers pay individually agreed fees. A special aspect of Factual’s business model is the fact that these fees depend on different aspects, such as the number of edits and contributions from a customer’s “community” to the Factual database (i.e. the compa-ny grants discounts which increase with the number of edits and contributions), cus-tomer-specific requirements for API service levels (such as response times and up-times for technical support), the volume of page views or active users, the types of data sets accessed, and “unencumbered” data swaps (such as “crosswalking IDs”). Besides business partner data, Factual offers a variety of other, continuously growing datasets.

Infochimps provides business partner data that is created both by Infochimps itself and by the user community. A small number of data sets are available for free. For all other data sets a fee has to be paid. Infochimps charges a commission fee for broker-ing data sets provided by users. Infochimps offers four different pricing models de-pending on the use of APIs per hour and per month. Infochimps does not limit its offering to the business partner data domain, but offers a variety of other data records as well, such as NFL football statistics. One business partner data set is titled “Inter-national Business Directoy [sic!]”. It contains addresses of 561,161 businesses and can be purchased at a price of USD 200. In case customers cannot find the data re-quired, Infochimps offers retrieving on a case-wise basis. InfoGroup OneSource offers business partner data on 17 million companies and 23 million business executives on a global level. A key business process is enriching data from a variety of different external sources. The OneSource LiveContent platform combines data from over 50 data suppliers and thousands of other data sources. The data is delivered over the web, through integration into Customer Relationship Man-agement (CRM) systems, and via information portals. Moreover, OneSource delivers data on a “data as a service” basis to salesforce.com users. OneSource charges sub-scription fees starting at EUR 10,000 p.a.

Table 1 uses the conceptual framework introduced above to compare the business models of the six business partner data providers included in the case study.

Page 9: Reader 36 WS 01 Rbae

481

Table 1. Business Models of the Case Study Companies

Avox BvD D&B Factual Infochimps InfoGroup One Source

Custom-ers

n/a 6,000 clients, 50,000 users.

100,000 from various industries.

n/a n/a Several thousands.

Competi-tors

Interactive Data, SIX Telekurs.

D&B, among others.

BvD, among others.

Similar offering as Infochimps.

Similar offering as Factual.

D&B, among others.

Offering One million entities, three ser-vice types, web ser-vices.

85 million companies, data and software support, web ser-vices, sales force.

177 million business entities, data and related services, web ser-vices, sales force.

Open data platform, API use for free or at a charge.

15,000 data sets, open data plat-form, four different pricing models, web service.

18 million companies, 20 million executives, data and software, web ser-vice.

Activities and or-ganization

Data re-trieval, analysis, cleansing and provi-sion

Monitoring of mergers and acquisi-tions, data analysis and provision.

Data collec-tion and optimiza-tion, provi-sion of quality data services.

Data mining, data re-trieval, data acqui-sition from external parties.

Data collec-tion, infra-structure develop-ment, host-ing, and distribution.

Selection of content providers, data col-lection, “data blending”, data up-dates.

Resources 38 analysts to verify and cleanse data, central database

500 em-ployees in 32 offices, central database (ORBIS).

More than 5,000 em-ployees, central database

21 em-ployees, central open data platform.

Less than 50 employees, central data platform.

104 em-ployees.

Factor and pro-duction inputs

Third-party vendors, official data sources, customers.

More than 100 differ-ent data sources.

Official sources, partnering, contact to companies

Open data communi-ty.

Open data community.

50 “world-class” suppliers, 2,500 data sources.

Scope of manage-ment

Internation-al coverage, co-creation, partnering.

Global coverage, alliances, data, soft-ware, con-sulting.

Global coverage.

Start-up company.

Start-up company.

Global coverage.

4.2 Resource Perspective

Resources play a key role in the development and maintenance of business models. Drawing upon the VRIN criteria, six key resources can be identified to be relevant for the specific business models of business partner data providers (see Table 2).

Page 10: Reader 36 WS 01 Rbae

482

Table 2. Key Resources for Business Models of Business Partner Data Providers

Valuable Rare Inimitable Non-substitutable

Labor Yes No No No

Expertise and Knowledge Yes Yes No Yes

Database Yes Yes No Yes

Information Technology and Proce-dures

Yes No No No

Network Access and Relationships Yes Yes Yes Yes

Capital Yes Yes No No

Labor is used primarily to collect and analyze data. D&B, for example, employs thou-sands of people to retrieve business partner data from chambers of commerce and other public data sources. As no special skills are needed to perform this task, labor is considered an imitable resource. Expertise and Knowledge refers to how business partner data is actually used, how business processes for creating and maintaining business partner data are designed, and how typical data quality problems are dealt with in customer organizations. Similar to labor, this expertise and knowledge is imi-table, as domain expertise is available both in the practitioners’ and the research community [4]. A Database is a resource which is valuable, rare and non-substitutable. The data itself, however, is imitable, in particular because business partner data mainly refers to company names and addresses, subsidiary company information, and the legal form, i.e. data which is available in the public domain. Information Technology and Procedures—e.g. an electronic platform through which business partner data is accessible for customers and which offers data aggregation and cleansing procedures—is valuable but does not meet any other VRIN criteria. Network Access and Relationships is of particular importance as all cases depend on access to external data sources, such as chambers of commerce (D&B) or customers (Avox). This resource is the only one that meets all four VRIN criteria. Finally, Capi-tal is a resource which is valuable and rare, but not inimitable and non-substitutable.

5 Case Analysis

5.1 Business Model Patterns

The analysis of the business models presented in the case study reveals a number of similarities between the cases investigated. The biggest similarity refers to the data providers’ core activities, which mainly consist of retrieving and collecting data, con-solidating it, and then providing it to their customers. Moreover, the companies use similar pricing model elements, ranging from subscription and usage fees to custom-er-specific service fees. However, there are also significant differences that can be observed. One main difference relates to the way the companies examined stand in

Page 11: Reader 36 WS 01 Rbae

483

relation with other actors from the network they are embedded in. As a result of the analysis, three business model patterns can be identified (see Figure 1).

Pattern I depicts the traditional buyer-supplier relationship between data consum-ers and data providers. A typical instantiation of this pattern can be found at D&B, for example. The flow of data is unidirectional, and so is the flow of money. Pattern II, in contrast, uses community sourcing principles and shows bidirectional flows of da-ta[25], [26]. In this pattern, data consumers provide data back to a common platform, and so they become “prosumers” [27]. The more they contribute, the more discounts they get on their fee as data consumers. This mechanism can be found at Avox and Infochimps, for example. Pattern III relies mainly on crowd sourcing mechanisms [28]. The data provider collaborates with data providers which are not necessarily data consumers at the same time.

Fig. 1. Business Model Patterns

While all business models of the data providers under investigation rely on the provi-sion of data by third parties to a certain extent, the business models that can be related to Pattern III are completely based on the principles of crowd sourcing. Both commu-nity sourcing and crowd sourcing have their roots in innovation management and its goal to include users and customers in the research and development process, and so the terms are often used synonymously. The paper, however, makes a distinction be-tween the two terms by looking at the actual sources. Whereas Pattern II uses data from a clearly defined community, namely customers, Pattern III does not pose any restrictions at all as long as providers of data comply with existing laws and terms and conditions. Moreover, the community sourcing approach is closely related to ensuring and improving the quality of the data in terms of data accuracy and consistency. Crowd sourcing concepts typically are related to data quality only in terms of data availability.

Page 12: Reader 36 WS 01 Rbae

484

5.2 Resource Allocation Patterns

To further explore the different business model patterns, a resource-based view is taken regarding the companies presented in the case study. The analysis focuses on the differences occurring in the allocation of the six resources introduced in Section 4.2. Figure 2 shows the results of this analysis.

Fig. 2. Resource Allocation in the Case Study Companies

“Traditional” data providers, such as BvD, D&B, and InfoGroup OneSource, are characterized by extensive allocation of resources in terms of Labor, Database, and Capital, but only medium allocation of resources with regard to Network Access and Relationships (although D&B, for example, employs about 5,000 people, which is by far more than any other competitor). In contrast, the business models of Factual and Infochimps rely on Network Access and Relationships to a major extent, although neither one employs a lot of staff or has sound Expertise and Knowledge in the busi-ness partner domain. As a consequence, both data providers use crowd sourcing mechanisms to enhance their databases. Avox takes an intermediate position when it comes to allocation of resources. Avox’ strongest resource is Expertise and Knowledge regarding a specific domain, namely business partner data for the finan-cial industry.

6 Interpretation of Case Study Findings

6.1 Business Model Framework

Taking a resource-based view helps find explanations why the six business partner data providers under examination use different business models. For example, being a de-facto monopolist, D&B was able to develop adequate resources to acquire and manage business partner data over decades. These resources—i.e. mainly Labor and

Page 13: Reader 36 WS 01 Rbae

485

Database—have allowed D&B to broadly diversify its offering in terms of scope, quality, and price of services. D&B’s ability to differentiate works as an entry barrier for new competitors. Since D&B is able to achieve high allocation of almost all of its key resources new entrants into the business partner data market are forced to find ways of extending their own resource base.

Two approaches of extending one’s resource base can be identified. Pattern II (community sourcing), as used by Avox, for example, represents a rather “conserva-tive” approach, with customers contributing to the service provider’s resources. This approach is appropriate if data providers are able to leverage existing customer rela-tionships in related areas of business (financial industry with a European focus in the case of Avox). A more “radical” extension of the resource base can be observed in business models following Pattern III (crowd sourcing), as used by Factual, for exam-ple. As a start-up company, Factual did not have any access to data via internal data-bases or existing customers, but had to build up their resources from scratch.

The downside for providers of business partner data services following Pattern II and Pattern III is that—although having successfully entered an until then de-facto monopoly market—they are limited in their offerings (data on certain industries only, data from customers only, for example) and the quality of the data they provide (community sourced or crowd sourced data is difficult to manage).

Fig. 3. Business Model Framework for Business Partner Data Providers

Exploring the situation of D&B, Avox, and Factual as typical examples of the Pat-terns I, II, and III, respectively, the paper proposes a business model framework (Fig-

Page 14: Reader 36 WS 01 Rbae

486

ure 3) for business partner data providers. The framework comprises five discrete dimensions: pricing (premium pricing vs. budget pricing), quality (managed data vs. unmanaged data), sourcing (self-sourcing vs. crowd sourcing), market share (high vs. low), and offering (broad vs. niche). As the first three dimensions (pricing/quali-ty/sourcing) correlated, they can be combined to form one single dimension. The same is true for the two other dimensions (market share and offering)—although in a more differentiated sense: While a niche provider—although strong in its niche—has a low overall market share, a low market share does not necessarily point to a niche provider but may also be the result of an early stage of market penetration.

Figure 3 illustrates the current positions of D&B, Avox, and Factual in the frame-work, which consists of four quadrants: niche provider, new market entrant, well-established crowd-sourcer, and well-established traditional provider. The labeling of the quadrants takes into account the dynamics of the market and potential develop-ment paths the market participants may follow.

As far as Factual is concerned, the position in the lower left quadrant (new market entrant) indicating a low market share and low quality, low cost data is highly unlike-ly to be sustainable. Therefore the necessary development for Factual should be to increase its market share in order to create new opportunities for more differentiated pricing models and active data management.

Avox, as a niche provider, and D&B, as a well-established traditional provider, have no immediate need to change their respective business model, which, however, only holds true in a stable environment (i.e. if there are proper niches to occupy and if there is limited competition in the premium segment, respectively). Relying on a sin-gle niche may be dangerous for Avox, as specialized knowledge may become general-ly available or may lose its value in the future. Therefore it may be an option for Avox to leverage its expertise in exploiting one niche segment and increase its market share by addressing further niches or extending its offering to existing customers (by means of mergers and acquisitions, for example).

Moreover, taking a resource-based view shows that there are not many key re-sources that are valuable, rare, inimitable and non-substitutable at the same time. In fact, Network Access and Relationships is the only key resource that meets each of the four criteria. In this regard, the well-established provider (D&B) has a rather weak position as far as the size of its network is concerned. At the same time, Factual, as a new entrant to the market, currently has the largest network and may be able to fur-ther improve its position regarding its other key resources. If this happened, Factual’s business model would become a “game changer”, since Factual would be able to offer similar offerings as D&B—managed data, for example—at much lower prices, thanks to its completely different cost structure. This would even affect the basic layout of the business model framework presented above, as the correlation of the framework dimensions would then become unstable. Furthermore, it is questionable whether D&B would be able to imitate this network resource, since that would require signifi-cantly different competencies and a different scope of management.

Apart from that, the business partner data domain includes both companies repre-senting the value chain paradigm (D&B, for example) and companies representing the value network paradigm (Factual, for example) [29]. Value networks leverage

Page 15: Reader 36 WS 01 Rbae

487

positive network effects [30], i.e. each new member of the network increases the val-ue of the network for all members. A value network may increase value and reduce costs at the same time, and thus create “winner-takes-it-all” situations through a bandwagon effect [29].

6.2 Research Propositions

From the findings of the case study and the conclusions made with the help of the business model framework a set of propositions can be identified (see Table 3). These propositions help understand current business models of business partner data provid-ers and outline their potential future development. Furthermore, the propositions lay the ground for future research to be done.

Table 3. Propositions on Business Models for Business Partner Data Providers

Proposition Description Supported by the case of

P1 New market entrants follow a growth strategy. Factual, Infochimps

P2a New market entrants choose either a niche strategy focusing on high-quality data (community sourcing) or a general strategy focusing on lower-quality data (crowd sourcing).

Avox, Factu-al, Infochimps

P2b Whether a niche strategy or a general strategy is chosen depends on having access to a niche community.

Avox

P3 Only a strong market position allows business partner data providers to differentiate their product portfolios and their pricing models.

BvD, D&B

P4a A strong market position may be achieved both by focusing on budget priced community data and by focusing on managed high-quality data.

Factual, Infochimps, D&B

P4b A strong market position may not be achieved by focusing on niche data.

Avox

P5 Community sourcing and even crowd sourcing will be a relevant ap-proach in times of increasing cost competition.

Avox, Factu-al, Infochimps

P6 If a new market entrant successfully creates significant network effects by turning a value chain industry into a value network industry, this transformation will be irreversible and mandatory to follow for its com-petitors.

Avox, D&B, Factual, Infochimps

7 Conclusion

The paper addresses two research questions with regard to business models of busi-ness partner data providers. First, it explores how these business models differ. The case study results imply that business models follow one of three different business model patterns: traditional buyer-supplier relationship, community sourcing, or crowd sourcing. These patterns differ mainly with regard to the instantiation of three busi-

Page 16: Reader 36 WS 01 Rbae

488

ness model concepts, namely “activities and organization”, “resources”, and “factor and production inputs”. Second, the paper examines why business models of business partner data providers differ. Adopting a resource-based view the paper develops a business model framework in which business partner data providers can be posi-tioned. Moreover, the paper identifies a set of propositions that help understand why these different business models evolved and how they may develop in the future.

The paper contributes to the scientific body of knowledge as it is among the first endeavors to address business models in the business partner data domain, which is a topic of high relevance but still scarcely examined in the field of information systems research. Case description and analysis are grounded in theory and lead to a set of propositions.

The paper may also benefit the practitioners’ community. The analysis of the busi-ness models together with the business model patterns that have been identified may help business partner data providers reflect their strategy and develop it further. Busi-ness partner data consumers may benefit from the findings by gaining a better under-standing of the supply side of the market.

Limitations of the paper derive mainly from the nature of case study research as a method of qualitative research. The paper is a first explorative step to deepen the un-derstanding of business models in the business partner data domain. To achieve more theoretical robustness—by elaborating on the causal relationships underlying the propositions and by testing these propositions—further qualitative, but also quantita-tive research is required. For example, the business model patterns may be triangulat-ed with business models of other data providers.

Acknowledgement

The research presented in this paper was partially funded by the European Commis-sion under the 7th Framework Programme in the context of the NisB (The Network is the Business) project (Project ID 256955).

References

1. Newman, D.: How to Plan, Participate and Prosper in the Data Economy. Gartner, Stam-ford, CT (2011)

2. Wang, R.Y.: A Product Perspective on Total Data Quality Management. Communications of the ACM 41 58-65 (1998)

3. Goodhue, D.L., Quillard, J.A., Rockart, J.F.: Managing The Data Resource: A Contingen-cy Perspective. MIS Quarterly 12, 373-392 (1988)

4. Madnick, S., Wang, R., Zhang, W.: A Framework for Corporate Householding. 7th Inter-national Conference on Information Quality, Cambridge, MA , 36-46 (2002)

5. Badenoch, D., Reid, C., Burton, P., Gibb, F., Oppenheim, C.: The value of information. In: Feeney, M., Grieves, M. (eds.): The value and impact of information. pp. 9-75. Bowker-Saur, London (1994)

6. Boisot, M., Canals, A.: Data, information and knowledge: have we got it right? Journal of Evolutionary Economics 14, 43-67 (2004)

Page 17: Reader 36 WS 01 Rbae

489

7. Oppenheim, C., Stenson, J., Wilson, R.M.S.: Studies on Information as an Asset I: Defini-tions. Journal of Information Science 29, 159-166 (2003)

8. Even, A., Shankaranarayanan, G.: Utility-driven assessment of data quality. ACM SIGMIS Database 38, 75-93 (2007)

9. Wang, R.Y., Strong, D.M.: Beyond Accuracy: What Data Quality Means to Data Consum-ers. Journal of Management Information Systems 12, 5-34 (1996)

10. Timmers, P.: Business models for Electronic Markets. Electronic Markets 8, 3-8 (1999) 11. Alt, R., Zimmermann, H.-D.: Business Models. Electronic Markets 10, 3-9 (2001) 12. Bain, J.S.: Industrial organization. Wiley, New York, NY (1968) 13. Wernerfelt, B.: A Resource Based View of the Firm. Strategic Management Journal 5,

171-180 (1984) 14. Barney, J.: Firm Resources and Sustained Competitive Advantage. Journal of Management

17 ,99-120 (1991) 15. Mintzberg, H.: Patterns in strategy formation. Management Science 24, 934–948 (1978) 16. Ginsberg, A.: Minding the Competition: From Mapping to Mastery. Strategic Management

Journal 15, 153–174 (1994) 17. Porter, M.E.: Competitive Strategy: Techniques for Analysing industry and Competitors.

The Free Press, New York (1980) 18. Zott, C., Amit, R., Massa, L.: The Business Model: Theoretical Roots, Recent Develop-

ments, and Future Research. IESE Business School - University of Navarra, Barcelona, Spain (2010)

19. Osterwalder, A., Pigneur, Y.: Business Model Generation. Wiley, Hoboken, NJ (2010) 20. Hedman, J., Kalling, T.: The business model concept: theoretical underpinnings and em-

pirical illustrations. European Journal of Information Systems 12, 49-59 (2003) 21. Yin, R.K.: Case study research: design and methods. Sage Publications, Thousand Oaks,

CA (2002) 22. Benbasat, I., Goldstein, D.K., Mead, M.: The Case Research Strategy in Studies of Infor-

mation Systems. MIS Quarterly 11, 369-386 (1987) 23. Eisenhardt, K.M.: Building Theories from Case Study Research. Academy of Management

Review 14 (1989) 532-550 24. Morgan, D.L., Krueger, R.A.: When to use Focus Groups and why? In: Morgan, D.L.

(ed.): Successful Focus Groups. Sage, Newbury Park, CA, 3-19 (1993) 25. Linder, J.C., Jarvenpaa, S., Davenport, T.H.: Toward an Innovation Sourcing Strategy.

MIT Sloan Management Review 44, 43-51 (2003) 26. von Hippel, E.: Innovation by User Communities: Learning from Open-Source Software.

MIT Sloan Management Review 42, 82-86 (2001) 27. Kotler, P.: The Prosumer Movement: A New Challenge for Marketers. In: Lutz, R.J. (ed.):

Advances in Consumer Research, Vol. 13. Association for Consumer Research, Provo, UT, 510-513 (1986)

28. Doan, A., Ramakrishnan, R., Halevy, A.Y.: Crowdsourcing Systems on the World-Wide Web. Communications of ACM 54, 86-96 (2011)

29. Fjeldstad, Ø.D., Haanæs, K.: Strategy Tradeoffs in the Knowledge and Network Economy. Business Strategy Review 12, 1-10 (2001)

30. Katz, M.L., Shapiro, C.: Network Externalities, Competition, and Compatibility. American Economic Review 75, 424 (1985)

Page 18: Reader 36 WS 01 Rbae

M.G. Jaatun, G. Zhao, and C. Rong (Eds.): CloudCom 2009, LNCS 5931, pp. 460–471, 2009. © Springer-Verlag Berlin Heidelberg 2009

Industrial Cloud: Toward Inter-enterprise Integration

Tomasz Wiktor Wlodarczyk, Chunming Rong, and Kari Anne Haaland Thorsen

Department of Electrical Engineering and Computer Science, University of Stavanger, N-4036 Stavanger, Norway

{tomasz.w.wlodarczyk,chunming.rong,kari.a.thorsen}@uis.no

Abstract. Industrial cloud is introduced as a new inter-enterprise integration concept in cloud computing. The characteristics of an industrial cloud are given by its definition and architecture and compared with other general cloud con-cepts. The concept is then demonstrated by a practical use case, based on Inte-grated Operations (IO) in the Norwegian Continental Shelf (NCS), showing how industrial digital information integration platform gives competitive advan-tage to the companies involved. Further research and development challenges are also discussed.

Keywords: cloud computing, integrated operations.

1 Introduction

The increasing amount of industrial digital information requires an integrated indus-trial information platform to exchange, process and analyze the incoming data, and to consult related information, e.g. historic or from other connected components, in order to obtain an accurate overview of the current operation status for a consequent decision. Collected information may often cross disciplines where it originated from. The challenge is to handle it in an integrated, cost effective, secure and reliable way. An enterprise may use the existing organizational structure for their information clas-sification. However, as collaborations often exist across enterprises, information flow that crosses enterprise boundaries must be facilitated. Earlier attempts have been made within one enterprise. An industry wide collaboration poses more challenges. Existing general solution such as information grid [1] are not adequate to deal with the complexity in the challenges.

Recently, there have been many discussions on what cloud is and is not [2-14]. Po-tential adopters were also discussed [4, 7]. However, most solutions mainly focus on small and medium size companies that adopt what is called a public cloud. Adoption of public cloud by large companies was discussed, but there were significant obstacles in it, mainly related with security. Some of them were answered by what is called a private cloud. In this paper, industrial cloud is introduced as a new inter-enterprise integration concept in cloud computing to solve the stated problem. Both definition and architecture of an industrial cloud are given and compared with the general cloud characteristics. By extending existing cloud computing concepts, we propose a solu-tion that may provide convenient, integrated and cost effective adaptation. These

Page 19: Reader 36 WS 01 Rbae

Industrial Cloud: Toward Inter-enterprise Integration 461

advantages are recognized in a large scale industrial collaboration project Integrated Operations (IO) [16], where a central element is to establish an inter-enterprise digital information integration platform for members of OLF in Norwegian Continental Shelf (NCS) [15].

The paper consists of five sections. After a short introduction in Section 1, a brief survey of the recent efforts on cloud computing is given in the Section 2. A categori-zation of cloud is proposed to reflect actual business models and to facilitate more precise definition.

In Section 3 the concept of industrial cloud is precisely defined. Generic architec-ture is proposed and explained. Further, a practical use case, based on Integrated Operations in NCS, is provided to show how this industrial digital information inte-gration platform gives a competitive advantage to companies involved. Existing tech-nologies that are essential parts of industrial cloud are named and described. In the end of this section further research and development challenges are also discussed. In Section 4 compact comparison of the three types of clouds: public, enterprise and industrial, is provided. Paper concludes with summary of main points.

2 Categories of Clouds

The general goals of cloud computing are to obtain better resource utilization and availability. The concept of cloud computing is presented sometimes as a grouping of other various concepts, especially SaaS, IaaS and HaaS [17], but the concept has also been defined differently from paper to paper in [2-14], indicating different models of cloud. The differences in organization and architecture of a cloud are often influenced by different business models cloud computing concept is applied to. Division between public and private (also hybrid between them) can be seen in several publications [8]. In this paper, public and enterprise cloud are identified by business models they are applied to, viewed from a global perspective.

2.1 Public Cloud

Public cloud is the most common model of cloud, with popular examples such as Amazon Web Services [18] and Google App Engine [19]. One definition of public cloud, given by McKinsey [3], states that:

Clouds are hardware-based services offering compute, network and storage capacity where:

1. Hardware management is highly abstracted from the buyer 2. Buyers incur infrastructure costs as variable OPEX 3. Infrastructure capacity if highly elastic (up or down)

Public cloud is used mainly by small and medium size companies, very often start-ups. That is because it offers effortless hardware management and flexibility without any significant entrance costs. Access to public cloud is realized through internet. Hardware is owned and managed by an external company. Hardware issues are of no interest for companies using it. High degree of hardware utilization is achieved by means of virtualization (other examples also exist [20]). Platform is generic, usually providing one of application frameworks or access to standard computing resources.

Page 20: Reader 36 WS 01 Rbae

462 T.W. Wlodarczyk, C. Rong, and K.A.H. Thorsen

There is no particular focus on collaboration between applications and no facilitation of reusing data between them. Public cloud features OpEx (Operational Expenditure) type of billing based on actual usage or on per month fee. There is small to usually none CapEx (Captial Expenditure).

Security and privacy might be an issue as data is stored by en external entity. On the other hand, cloud providers might have better focus and bigger resources to ad-dress those issues than a small company [21]. Companies have no control over cloud provider. Therefore, it is important that there are clear policies on data handling and possibly external audit [22]. Public cloud also might raise geopolitical issues because of physical data placement. That is currently solved by separate data centers in differ-ent parts of the world [18]. However, it is a questionable solution in longer term. There is vendor lock-in threat, resulting in problems with data transfer between cloud vendors. However, that is a bigger issue for users of cloud-based applications than for companies providing services over the cloud.

2.2 Enterprise Cloud

Enterprise cloud focuses not only on better utilization of computer resources, but also on integrating services crucial to company’s operations and thereof their optimization. Good example here is Cisco vision [17].

Access to enterprise cloud is realized mainly through intranet, but internet might also be used. Hardware is owned and managed by the enterprise itself. Therefore, hardware issues are still present, however, to lesser extent. Hardware utilization can be improved by means of virtualization; however it might cover only some parts of company’s datacenter. Platform is designed for the specific purpose and capable of supporting company’s key operations. There is strong focus on collaboration between applications and facilitation of reusing and integrating data between them. Enterprise cloud can be economically beneficial to the company however it requires up-front investment and does not offer OpEx-type of billing.

Control, security and privacy is not an issue (beyond what is required currently) as data are stored by the company itself. What is more, thanks to centralization security level might significantly increase [23]. There might be some geopolitical issues in case of centralization of international operations. There is no significant vendor lock-in threat. Dependence on software vendors providing cloud functionalities is more or less the same as on currently used software.

Adoption of public cloud by large companies or enterprises was also discussed [3]. There are already some examples of such adoptions [24]. At the same time many companies do not even consider such step. In their case benefits of public cloud are too small to counterbalance security, privacy and control risks.

2.3 Beyond Enterprise Cloud

Enterprise cloud seems to be a good solution for integration inside a large company. However nowadays, enterprises face additional challenges which result from collabo-ration with other enterprises in the industry. Such collaboration is necessary to stay competitive, but it requires introduction of new technological solutions.

Page 21: Reader 36 WS 01 Rbae

Industrial Cloud: Toward Inter-enterprise Integration 463

Some of integration and provisioning challenges have already been discussed in the concept of Information Grid. Notably, Semantic Web solutions were proposed to unify all the data in the company and to view them in a “smooth continuum from the Internet to the Intranet” [1]. Some authors proposed also integrating resources provi-sioning [25]. However, Information Grid model, that focuses on one enterprise only, did not offer convenient, seamless and integrated approach to practically solve inter-enterprise challenges. It is not only information data that are involved, but also work processes, and definition, operation and service models that need to be reconciled and collaborated in a seamless way. Hence, information grid is only a beginning. Finally, Information Grid model does not lead to new opportunities in the industry in the way cloud computing does e.g. lowering entrance costs for start-ups that leads to increased competition and innovation level.

Therefore, in the next section industrial cloud is introduced as a new inter-enterprise integration concept in cloud computing. A precise definition is given and then explained by a practical use case.

3 Industrial Cloud

3.1 Definition and Architecture

Industrial cloud is a platform for industrial digital information integration and col-laboration. It connects unified data standards and common ontologies with open and shared architecture in order to facilitate data exchange and service composition be-tween several companies. It should be controlled by an organization in form of e.g. special interest group (SIG) consisting of industry representatives to ensure develop-ment, evolution and adoption of standards. SIG should cooperate with international standardization body.

In Fig. 1. industrial cloud is presented. It binds together enterprises in the industry and also service companies. Enterprises are the core of the industry. Service compa-nies usually provide services to those enterprises and very often participate in more than industry.

In traditional business-to-business (B2B) systems metadata and semantics are agreed upon in advance and are encapsulated in the systems. However, the trend is moving towards more open environment where communicating partners are not given at prior. This demands solutions where the semantics are explicit and standardized [26]. Infor-mation management, information integration and application integration require that the underlying data and processes can be described and managed semantically.

Collaboration and communication within an industrial cloud depend on a shared understanding of concepts. Therefore, the basic elements of industrial cloud are uni-fied data standards, common ontolgies, open and shared architecture and secure and reliable infrastructure. Unified data standards allow easy data exchange between companies. Common ontologies ensure shared point of view on meaning of data. Metadata need to be shared among applications, and it should be possible to semanti-cally describe applications within the cloud. Open and shared architecture is a way to efficiently interconnect participants in industrial cloud.

Page 22: Reader 36 WS 01 Rbae

464 T.W. Wlodarczyk, C. Rong, and K.A.H. Thorsen

Industrial Cloud

Enterprise Cloud

Enterprise Cloud

Enterprise Cloud

Service CompanyCloud

Service CompanyCloud

Fig. 1. Industrial Cloud

An ontology is a structure capturing semantic knowledge about a certain domain, by describing relevant concepts and the relations between these concepts [27, 28]. With a shared ontology it is possible to communicate information across domains and systems, independent of local names and structuring. This enables an automatic and seamless flow of data, where information can be accessed from its original location in the same way as if it was stored locally. In [29] Noy et al point out several reasons to construct and deploy ontologies, e.g.: ease of information exchange, easier for a third party to extract and aggregate information from diverse systems, easier to change assumptions of the world and analyze domain knowledge.

The ontology creation should be mainly industry focused process. There is current and stable trend of moving construction of meta-data from enterprise to industrial level. Cross-industry approach might be useful; however, it is not probably on larger scale. In our current work we see that those ontologies have to be hierarchically or-ganized depending on their detail level. The more general ones will be common in the industry. More detailed ones might stay specific to a particular company or consor-tium. However, they will still have reference to the more general ontologies.

Data standards together with ontologies acting on open and shared architecture al-low for easy service composition from multiple providers. Secure and reliable infra-structure builds trust for the platform and between all participants.

It should be easy to add new applications to the cloud and applications should be easy to be found based on the services and they provide. By providing applications as semantically described web services [30], based on the commonly agreed ontology, it would be easy to search for particular service within them. Domain-knowledge is extracted from the applications; not hard-coded within the systems. It is then easier to provide new services, and automatically interpret the operations provided by these services.

Industry can form an industrial cloud in order to enable on-the-fly and automatic outsourcing and subcontracting, lower operation costs, increase innovation level and create new opportunities for the industry. Cloud approach can be used as a way to ensure abstraction layer over all underlying technological solutions and integration patterns. Industrial cloud is the lacking element that binds and structures existing

Page 23: Reader 36 WS 01 Rbae

Industrial Cloud: Toward Inter-enterprise Integration 465

Data formats, Ontologies, Architecture and Infrastructure

Service company

Service company

Enterprises

Enterprises

Enterprises

Information exchange

Service composition

Decision support

Industrial Cloud

Agent

Fig. 2. Integration, collaboration and composition in industrial cloud

technologies on the way to practical implementation. Fig. 2. summarizes main goals of industrial cloud, that is: information exchange, decision support and service com-position.

As for now, industrial cloud was defined in terms of its general purpose and tech-nologies used. Further, it is important to place it in comparison with already existing types of clouds. In Fig. 3. all three types of cloud are presented in a form of a stack of functionalities they provide.

Looking at current providers of public cloud like Google Apps Engine[19] or AWS[18] one can see that they offer two basic functions: provisioning (mainly of processing time and storage space), and metering and billing systems for re-sources they provide. Public cloud is realized through hardware virtualization (or similar technologies). Cloud provider supplies an API that is later utilized by cloud adopters.

Enterprise cloud builds on fundament of public cloud. Further, it adds possibility of administrating workflows in the cloud, managing workload and monitoring which goes further than simple metering in public cloud. In this way enterprise cloud is less general but at the same time provides better support of large business users.

Industrial cloud is created on the base of public and enterprise cloud. It features easier hardware provisioning by virtualization, it offers workflows administration, workload management and monitoring. However, it further facilitates integrational tasks like policies, reliability management, security and trust, outsourcing and subcon-tracting. It adds support for semantic interpretation of data, mapping, data fusion and service discovery and composition.

Fig. 3. visualizes why the inter-enterprise integration concept introduced in this paper forms part of cloud computing. It builds on already existing cloud models and introduces extensions to them based on actual needs of industries. With time some of new functions in industrial cloud may migrate into the lower level clouds.

Page 24: Reader 36 WS 01 Rbae

466 T.W. Wlodarczyk, C. Rong, and K.A.H. Thorsen

Industrial Cloud Management Services

Enterprises SIGAuthoritiesService

Companies

Outsourcing&

Subcontracting

Semantic interpretation

Mapping&

Integration

Security&

Trust

Reliability management

ServiceDiscovery &Composition

Data fusionPolicy

Administrationworkflows

MonitoringWorkload

management

ProvisioningBilling,

Metering, etc.Public Cloud

Enterprise Cloud

Industrial Cloud

Fig. 3. Industrial Cloud stacked on Enterprise and Public Cloud

3.2 Example from the Integrated Operations in Oil and Gas

The oil and gas industry on NCS has for some years now been working on the con-cept of Integrated Operations (IO). Integrated operations aim at supporting the indus-try in “reaching better, faster and more reliable decisions”, and is expected to have a great impact on information flow between different sub-domains. IO is planed to be implemented in two steps: Generation 1 and Generation 2 (G1 and G2). G1 focuses on integration of offshore and onshore, real-time simulation and optimizing of key work processes. G2 integrates operation centers of operators (enterprises) and vendors (service providers), focuses on heavy automation of processes and optimization of processes across domains. There are several ongoing research project related to IO. The biggest, Integrated Operations in the High North (IOHN) [16], embraces several sub-projects focusing on different aspects of IO G2. The suggested technologies rely on an underlying architecture to build upon. The industrial cloud may provide such architecture solution

The oil and gas industry is an information and knowledge industry. Data exist for decades and needs to be shared across different businesses, domains and applications. By combining data from several sources it is possible to gain more information than if the information was separated. This relies on the ability to semantically recognize the content of data. At present, data are isolated in information silos. Communicating and sharing information often result in man-hours and expenses on mapping data from one structure to another. By example, within the sub-domain of drilling and comple-tion alone there are more than five different communication standards to relate to e.g. WITSML or OPC-UA. Much of the knowledge and logic is hard-coded within the different applications. It is difficult to share and transfer data to new or other systems

Page 25: Reader 36 WS 01 Rbae

Industrial Cloud: Toward Inter-enterprise Integration 467

without information loss. In the recent years, ISO15926 was being developed as an upper-level data integration standard and ontology that could enable data sharing among several companies. It proved to be successful in initial tests. With the use of a shared ontology metadata are extracted from the applications and presented in a way that can be more easily shared among partners.

Data are often stored in several places and over time these data tend to be inconsis-tent. Barriers between isolated information domains need to be broken down. There is a need for solutions where data can be accessed directly from the source. The indus-trial cloud focuses on cross-company application collaboration, and will ease commu-nication and access of data across company and application boundaries.

The oil and gas industry has already developed SOIL, an industrial communication network that provides high reliability and independence of other solution like internet. However, SOIL does not offer any kind of collaboration and integration facilities apart from secure network connection. The industry consists of many, both small and large, companies. Service companies providing services to several operators spend much time on integration with operators’ systems. With an underlying cloud architec-ture service provides can offer new services to the cloud as a whole, without the need for tailored integration with all the different operators.

Industrial cloud could serve as a platform for actual delivery of Integrated Operations on NCS. Industrial cloud is capable of providing easy, abstracted access to all aforemen-tioned technological solution integrating them in one efficient and simple product.

3.3 Challenges and Further Work

Industrial cloud can be the solution to the problem of inter-enterprise digital informa-tion integration and collaboration. However, there are a few challenges that should be a subject of research and industrial effort while practically implementing the indus-trial cloud concept.

Integration and collaboration requires inter-enterprise standardization. To do that different definitions or names on the same concept, different data formats, different work procedures have to be reconciled. This is usually easier said than done. For example, the ISO15926 is still far from completion after over ten years effort with participation of major actors in the domain.

The biggest challenge is security. How to secure each companies data, but at the same time do not impede collaboration? Multi-level authentication could be a solution to this. However, more development in this field has to done as proper security solu-tions will be a key element of industrial cloud.

Other challenges consist of dealing with many versions of truth for reasoning pur-poses, what is result of shared environment and integration of data in many formats. This topics are subject of current research in Semantic Web field [31].

Enabling old data to be used in the new environment is also a challenge. It is im-portant as companies want to use all the data they already have. There already have be interesting attempts to do that [32].

Communication and possibly synchronization between industrial cloud and enter-prise clouds is not yet solved. Similar but not exactly the same problems are already investigated in form of synchronization between private and public cloud [8].

As outsourcing can be automatic there is a need for automated contracting solu-tions, which have been topic of recent research [33].

Page 26: Reader 36 WS 01 Rbae

468 T.W. Wlodarczyk, C. Rong, and K.A.H. Thorsen

4 Cloud Categories Comparison

Industrial cloud should also be compared with other types of cloud in terms of: how it is implemented, who is using it and what are problematic issues. This is summarized in Table. 1. In contrast with public cloud, industrial cloud is used by large companies together with smaller companies and in contrast with enterprise cloud it focuses on collaboration between several companies. Access to industrial cloud is realized

Table 1. Cloud categories comparison

Public Enterprise Industrial

Who and why Small and medium companies; to lower hardware maintenance costs

Large companies; to integrate internal services

Large and other companies in one industry; to integrate inter-enterprice collaboration

Network Internet Intranet (and internet) Extranet (and internet)

Hardware External owner; aggressive virtualization

Owned by the enterprise; some virtualization

Many owners; some individual virtualization; cross company virtualization not probable

Platform programming and resources access

supporting integration of operations

focused on integration and collaboration

Applications Various; no collaboration

Company specific; collaboration

Enterprise specific; collaboration and composition

Economics OpEx CapEx CapEx, some OpEx possible

Security and privacy

Might be higher in some aspects; but privacy is a significant problem

Security will increase as a result of central enforcing of policies

Crucial issue; need of top-level security while preserving collaboration

Control Problem; need of open policies and external audits

Not an issue; everything controlled by one company

Not an issue; controlled by SIG (collaborating with international standard authority)

Geopolitics Problem; geographically dependent data centers only a tempo-rary solution

Some issues; but company should be ready to deal with them

Some issues; but industry should be ready to deal with them

Vendor lock-in Problem; open standards should help

Not a problem; everything owned by the company

Not a significant problem; issue controlled by SIG

Page 27: Reader 36 WS 01 Rbae

Industrial Cloud: Toward Inter-enterprise Integration 469

through extranet or internet. Hardware is owned and managed independently by many companies, though, some part of hardware in each company will follow shared stan-dard of open architecture. Basing on that, some companies can provide access to its data centers to other companies. This will improve hardware utilization and also fa-cilitate agent mobility.

Platform is designed for the specific purpose and capable of supporting industry’s key operations. There is strong focus on collaboration between applications and facili-tation of reusing and integrating data between them. Security and privacy are crucial issue as data must be shared and protected at the same time. Because of security and reliability needs extranet implementation might be very often advised. Some geopo-litical issues might appear, however, industries are probably already aware of them. Vendor lock-in threat is not a significant issue as long as industrial cloud is wisely managed by SIG. Actually, it might be much smaller than currently. SIG should be organized on industrial level. Cross-industrial approach would most probably create many SIGs that would jeopardize standardization process. It should possible to avoided on industrial level, even though, that is definitely a challenge.

5 Summary

In this paper, industrial cloud is introduced as a new inter-enterprise integration con-cept in cloud computing. Both definition and architecture of industrial cloud are given in comparison with the general cloud characteristics. The concept is then demon-strated by a practical use case, based on IO in the NCS, showing how industrial digital information integration platform gives competitive advantage to the companies in-volved. The oil and gas industry in NCS recognizes the great potential value in full implementation and deployment of industrial cloud, where integration and collabora-tion are the key.

References

1. Alonso, O., Banerjee, S., Drake, M.: The Information Grid: A Practical Approach to the Semantic Web,

http://www.oracle.com/technology/tech/semantic_technologies/ pdf/informationgrid_oracle.pdf

2. Mitra, S.: Deconstructing The Cloud (2008), http://www.forbes.com/2008/09/18/ mitra-cloud-computing-tech-enter-cx_sm_0919mitra.html

3. Forrest, W.: McKinsey & Co. Report: Clearing the Air on Cloud Computing (2009), http://uptimeinstitute.org/images/stories/ McKinsey_Report_Cloud_Computing/ clearing_the_air_on_cloud_computing.pdf

4. Buyya, R., Chee Shin, Y., Venugopal, S.: Market-Oriented Cloud Computing: Vision, Hype, and Reality.... In: 10th IEEE International Conference on High Performance Com-puting and Communications, HPCC 2008 (2008)

5. Douglis, F.: Staring at Clouds. IEEE Internet Computing 13(3), 4–6 (2009) 6. Grossman, R.L.: The Case for Cloud Computing. IT Professional 11(2) (2009)

Page 28: Reader 36 WS 01 Rbae

470 T.W. Wlodarczyk, C. Rong, and K.A.H. Thorsen

7. Hutchinson, C., Ward, J., Castilon, K.: Navigating the Next-Generation Application Archi-tecture. IT Professional 11(2), 18–22 (2009)

8. IBM. IBM Perspective on Cloud Computing (2008), http://ftp.software.ibm.com/software/tivoli/brochures/ IBM_Perspective_on_Cloud_Computing.pdf

9. Lijun, M., Chan, W.K., Tse, T.H.: A Tale of Clouds: Paradigm Comparisons and Some Thoughts on Research Issues. In: Asia-Pacific Services Computing Conference 2008, APSCC 2008, IEEE, Los Alamitos (2008)

10. Lizhe, W., et al.: Scientific Cloud Computing: Early Definition and Experience. In: 10th IEEE International Conference on HPCC 2008 (2008)

11. Youseff, L., Butrico, M., Da Silva, D.: Toward a Unified Ontology of Cloud Computing. In: Grid Computing Environments Workshop, GCE 2008 (2008)

12. Rayport, J.F., Heyward, A.: Envisioning the Cloud: The Next Computing Paradigm (2009),

http://www.marketspaceadvisory.com/cloud/ Envisioning_the_Cloud_PresentationDeck.pdf

13. Weinhardt, C., et al.: Business Models in the Service World. IT Professional 11(2), 28–33 (2009)

14. Open Cloud Manifesto (2009), http://www.opencloudmanifesto.org/ 15. Map of the Norwegian continental shelf (2004),

http://www.npd.no/English/Produkter+og+tjenester/ Publikasjoner/map2003.htm

16. Integrated Operations in the High North, http://www.posccaesar.org/wiki/IOHN

17. Gore, R.: The experience of Web 2.0 Communications and collaboration tools in a global enterprise - The road to 3.0 (2009),

http://www.posccaesar.org/svn/pub/SemanticDays/2009/ Session_1_Rich_Gore.pdf

18. Amazon Web Services, http://aws.amazon.com 19. Google App Engine, http://code.google.com/appengine/ 20. Perilli, A.: Google fires back at VMware about virtualization for cloud computing (2009),

http://www.virtualization.info/2009/04/ google-fires-back-at-vmware-about.html

21. Have You Adopted Small Business Cloud Computing? (2009), http://www.smallbusinessnewz.com/topnews/2009/02/04/ have-you-adopted-small-business-cloud-computing

22. Gartner: Seven cloud-computing security risks (2008), http://www.infoworld.com/d/security-central/ gartner-seven-cloud-computing-security-risks-853

23. Should an organization centralize its information security division? (2006), http://searchsecurity.techtarget.com/expert/ KnowledgebaseAnswer/0,289625,sid14_gci1228539,00.html

24. Google Apps makes its way into big business (2009), http://www.computerweekly.com/Articles/2008/06/24/231178/ google-apps-makes-its-way-into-big-business.htm

25. Taylor, S., Surridge, M., Marvin, D.: Grid Resources for Industrial Applications. In: IEEE International Conference on Web Services (2004)

26. Aassve, Ø., et al.: The SIM Report - A comparative Study of Semantic Technologies (2007)

Page 29: Reader 36 WS 01 Rbae

Industrial Cloud: Toward Inter-enterprise Integration 471

27. Antoniou, G., Harmelen, F.v.: A Semantic Web Primer, 2nd edn. MIT Press, Cambridge (2008)

28. Grobelnik, M., Mladeni, D.: Knowledge Discovery for Ontology Construction. In: John Davies, R.S.P.W. (ed.) Semantic Web Technologies, pp. 9–27 (2006)

29. Noy, N.F., McGuinness, D.L.: Ontology Development 101: A guide to.., in Stanford Knowledge Systems Laboratory Technical Report, p. 25 (2001)

30. Roman, D., et al.: Semantic Web Services - Approaches and Perspectives. In: Davies, J., Studer, R., Warren, P. (eds.) Semantic Web Technologies: Trends and Research in Ontol-ogy-based Systems, pp. 191–236. John Wiley & Sons, Chichester (2006)

31. W3C Semantic Web Activity (2009), http://www.w3.org/2001/sw/ 32. Calvanese, D., Giacomo, G.d.: Ontology based data integration (2009),

http://www.posccaesar.org/svn/pub/SemanticDays/2009/ Tutorials_Ontology_based_data_integration.pdf

33. Baumann, C.: Contracting and Copyright Issues for Composite Semantic Services. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 895–900. Springer, Heidelberg (2008)

Page 30: Reader 36 WS 01 Rbae

5

DEVELOPING A BUSINESSCASE AND A DATA QUALITYROAD MAP

CHAPTER OUTLINE5.1 Return on the Data Quality Investment 685.2 Developing the Business Case 695.3 Finding the Business Impacts 695.4 Researching Costs 725.5 Correlating Impacts and Causes 735.6 The Impact Matrix 745.7 Problems, Issues, Causes 755.8 Mapping Impacts to Data Flaws 755.9 Estimating the Value Gap 76

5.10 Prioritizing Actions 795.11 The Data Quality Road Map 815.12 Practical Steps for Developing the Road Map 845.13 Accountability, Responsibility, and Management 845.14 The Life Cycle of the Data Quality Program 865.15 Summary 90

One of the most frequently asked questions about developinga data quality program is “how do we develop a convincing busi-ness case for investing in information quality improvement?” Inthis chapter we look at how our characterization of risksassociated with ignoring data quality problems can be presentedto senior management as an opportunity for developing compet-itive advantage, and what considerations for staffing andplanning can be compiled into a tactical road map for deployinga data quality strategy.

One of themajor issues is that the seniormanagers who alreadyrecognize the value of improved data quality don’t need justifica-tion to initiate a data quality program. However, organizational

# 2011 Elsevier Inc. All rights reserved.

Doi: 10.1016/B978-0-12-373717-5.00005-1 67

Page 31: Reader 36 WS 01 Rbae

68 Chapter 5 DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP

best practices require that some form of business case be assem-bled and presented to a governing body to justify the investmentin any kind of activity. A data quality improvement program is aserious commitment on behalf of an organization, and its impor-tance deserves to be effectively communicated to the all of thebusiness managers who may participate, either as sponsors or asbeneficiaries.

In chapter 1, we identified key impact dimensions andcorresponding impact categories associated with poor data qual-ity. The process of building a business case to justify both thetechnology and the organizational infrastructure necessary toensure a successful program requires additional research anddocumentation, namely:• Quantification of identified financial impacts,• Assessment of the actual financial impacts,• Determination of the source of the actual root causes in

the information processing that are correlated to thoseimpacts,

• Diagnosis of the root cause of the process failure,• Determination of potential remediation approaches,• The costs to remediate those process failures, and• A way to prioritize and plan the solutions of those problems.

All of this information can be accumulated into a pair oftemplates: one for impact analysis and the other for estimatingthe opportunity for value improvement or creation. In particular,the impact template is used to document the problems, issues,business impacts, and quantifiers. Together all this informationenables the practitioner to estimate a quantified yearly incurredimpact attributable to poor data quality.

5.1 Return on the Data Quality InvestmentWhat is the purpose for developing a return on investment

(ROI) model? In many situations, the ROI formulation is usedbefore starting a project purely for the purpose of projectapproval and initiation and is then forgotten. In otherenvironments, the ROI calculation is made after the fact as away of demonstrating that some activity had some kind of posi-tive business impact. In either situation, the ROI model is a mar-keting device. But while one might consider this approach asappropriate for projecting a return on investment, it is also rea-sonable to consider whether the expected “returns” are directly(and predictably) attributable to operations that are within theorganization’s control.

Page 32: Reader 36 WS 01 Rbae

Chapter 5 DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP 69

As an example, let’s say the national tax collection agency(in the United States that is the Internal Revenue Service) hasbuilt a business case for the investment of a large amount ofmoney to reengineer its software systems, using an expectedincrease in tax collections as the business justification. The ROImodel suggests that building a more modern application systemwill result in greater collections. The improved system mayaccount for more precision in calculating and collecting taxes,but in reality the amount of taxes collected depends onmore thanjust the computer application. A downturn in the economy mightresult in more people out of work, legislation may mandate afreeze on the minimum wage or lower the tax rates, or naturaldisasters may result in migratory populations that are difficultto track down and contact. In essence, justifying the creation ofa new application system based on increased collections ignoresthe fact that the expected performance results depend on anumber of other variables beyond the organization’s control.

5.2 Developing the Business CaseTherefore, the intention is not just to provide information

that can be used to justify a data quality program, it is to providea foundation for continuing to use the knowledge acquired dur-ing this phase to manage performance improvement over thedata quality life cycle. If the impacts are truly related to poordata quality, then improving data quality will alleviate the painin a measurably correlated manner.

In turn, then, the ROI model becomes a management tool togauge the effectiveness of the program. If improving data qualityreally will lead to improvements in achieving the businessobjectives (as is to be claimed by the business case), then thesame measures used to determine the “value gap” can be usedto monitor performance improvement!

The process for developing a business case is basically a questto identify a “value gap” associated with data quality – the area ofgreatest opportunity for creating new value with the optimalinvestment. Following the process summarized in Figure 5.1 willhelp the analyst team identify the opportunities with the highestvalue and, therefore, the highest priority.

5.3 Finding the Business ImpactsIt is highly probable that not only will there be an awareness of

existing data quality issues, there will also be some awarenessof the magnitude of the impacts these issues incur. The value of

Page 33: Reader 36 WS 01 Rbae

Identifybusinessimpacts

Research costsCorrelate

impacts andcauses

Build the impactmatrix

Research rootcauses

Map impacts todata flaws

Estimate valuegap

Prioritize tasks

Figure 5.1 Developing abusiness case for data qualitymanagement.

70 Chapter 5 DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP

the impact taxonomy developed based on thematerial in chapter 1is twofold. First, by clearly specifying themany different impacts, itis possible to trace some of the issues back through the processingstages and determine whether some number can be attributed to asingle process failure. Second, it shows how the results of differentdata quality events can be grouped together, which simplifies theresearch necessary to determine financial impact.

5.3.1 Roles and ResponsibilitiesAlthough there may be some awareness of existing issues, as

a practical matter, the process of identifying and categorizingimpacts is best performed as a collaborative effort among theline-of-business managers and their supporting informationtechnology staff. The early process of identification, by necessity,relates poor data quality to business issues, which requireknowledge of both business processes and how applicationssupport those processes. Therefore, a small team consisting ofone business representative and one IT representative from eachline of business should assemble to expose those issues that willdrive the business case.

This meeting should be scheduled for an extended block oftime (half a day) and convene at a location that is away from dis-tractions such as telephone and email. One attendee should beincluded as a scribe to document the discussion.

5.3.2 Clarification of Business ObjectivesBecause data quality management is often triggered by acute

events, the sentiment may be reactive (“what do we do right nowto improve the quality?”), perhaps with some level of anxiety.

Page 34: Reader 36 WS 01 Rbae

Chapter 5 DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP 71

To alleviate this, it is necessary to level-set themeeting and ensurethat every participant is aware that the goal is to come up withclearly quantifiable issues attributable to unexpected data.

To achieve this, it is useful for each group’s business partici-pant to prepare a short (10 minutes) overview of that group’sbusiness objectives – what services the group provides, whatinvestment is made (staffing and otherwise) in providing thoseservices, and how success is quantified. Next, each group’s ITparticipant should provide a short overview of how informationis used to support the group’s services and achieve the businessobjectives.

5.3.3 Identification and ClassificationThe next step, then, in developing the business case is to

clearly identify the issues attributable to poor data quality andto determine if they indeed are pain points for the organization.Again, we can employ the impact categories described inchapter 1 in this process, mostly from the top down by askingthese questions:• Where are the organization’s costs higher than they should

be?• Are there any situations in which the organization’s revenues

are below expectations? (Note: for nonprofit or govern-mental organizations, you may substitute your quantifiableobjectives for the word revenues.)

• Are there areas where confidence is lowered?• What are the greatest areas of risk?

The answers to these questions introduce areas for furtherconcentration, in which the questions can be refined to focuson our specific topic by appending the phrase “because of poordata quality” at the end (e.g., “Where are the organization’s costshigher than they should be because of poor data quality?”). Theanalyst can again employ the taxonomy at a lower level, askingquestions specifically about the lower levels of the hierarchy.For example, if the organization’s costs are higher, is it due toerror detection, correction, scrap, rework, or any other area ofincreased overhead costs?

5.3.4 Identifying Data FlawsAt the same time, it will be necessary to understand how the

impact is related to poor data quality. Most often a direct relationcan be assessed – each issue has some underlying cause that canbe identified at the point of manifestation. For example, extra

Page 35: Reader 36 WS 01 Rbae

72 Chapter 5 DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP

costs associated with shipping ordered items occur when theoriginal shipping address is incorrect and the item is returnedand needs to be shipped a second time.

Data flaws are the result of failed processes, so understandingthe kinds of data flaws that are causing the impacts will facilitateroot cause analysis. At the end of this stage, there should be alist of data flaws and business impacts that require further inves-tigation for determination of financial impact, assessment ofmeasurement criteria, and setting performance improvementgoals.

5.4 Researching CostsThe next step in the process is to get a high-level view of the

actual financial impacts associated with each issue. This stepcombines subject matter expertise with some old-fashioneddetective work. Because the intention of developing a businesscase is to understand gross-level impacts, it is reasonable toattempt to get a high-level impact assessment that does notrequire significant depth of analysis. To this end, there is someflexibility in exactness of detail. In fact, much of the informationthat is relevant can be collected in a relatively short time.

In this situation, anecdotes are good starting places, sincethey are indicative of high-impact, acute issues with high man-agement visibility. Since the current issues probably have beenfestering for some time, there will be evidence of individualsaddressing the manifestation of the problem in the past. Histor-ical data associated with work/process flows during critical dataevents are a good source of cost/impact data.

To research additional impact, it is necessary to delve deeperinto the core of the story. To understand the scope, it is valuableto ask these kinds of questions:• What is it about the data that caused the problem?• How big is the problem?• Has this happened before?• How many times?• When this happened in the past, what was the remediation

process?• What was done to prevent it from happening again?

Environments with event and issue tracking systems have ahead start, as the details will have been captured as part of the res-olution workflow. Alternatively, organizations with formal changecontrol management frameworks can review recommended andimplemented changes triggered as a result of issue remediation.

Page 36: Reader 36 WS 01 Rbae

Chapter 5 DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP 73

An initial survey of impact can be derived from this source –detection, correction, scrap and rework, and system developmentrisks are examples of impact categories that can be researchedthrough this resource.

At the same time, consult issues tracking system event logsand management reports on staff allocation for problem resolu-tion and review external impacts (e.g., stock price, customersatisfaction, management spin) to identify key quantifiers forbusiness impact.

5.5 Correlating Impacts and CausesThe next step in developing the business case involves track-

ing the data flaws backward through the information processingflow to determine at which point in the process the data flawwas introduced. Since many data quality issues are very likelyto be process failure, eliminating the source of the introductionof bad data upstream will provide much greater value than justcorrecting bad data downstream.

Consider the example in Figure 5.2. At the data input pro-cessing stage, a customer name and contact information are

Data Flaws Incur Business Impacts – An Example

Customer Service

Accts Receivable

Fulfillment

Customercontact name,contact infomisspelled atdata entrypoint

Customer namedoes not matchcustomerdatabase, newrecord insertedwith invalidinformation

Figure 5.2 An example of howone data flaw causes multipleimpacts.

Page 37: Reader 36 WS 01 Rbae

Problem

Figure 5.3 An example of animpact template.

74 Chapter 5 DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP

incorrectly entered. The next stage, in which an existing cus-tomer record is located, the misspelling prevents the locationof the record, and a new record is inadvertently created. Impactsare manifested at Customer Service, Accounts Receivable, andFulfillment.

In this supply chain example, it is interesting to note thateach of the client application users would assume that theirissues were separate ones, yet they all stem from the same rootcause. The value in assessing the location of the introductionof the flaw into the process is that when we can show that onecore problem has multiple impacts, the value of remediatingthe source of the problem will be much greater.

5.6 The Impact MatrixThe answers to the questions combined with the research will

provide insight into quantifiable costs, which will populate animpact matrix template. A simple example, shown in Figure 5.3,is intended to capture information about the different kindsof impacts and how they relate to specific problems. In thisexample, there are five columns in the impact matrix:

Issue Business Impact Quantifier YearlyIncurredImpact

Page 38: Reader 36 WS 01 Rbae

Chapter 5 DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP 75

1. Problem – this is the description of the original sourceproblem.

2. Issue – this is a list of issues that are attributable to the prob-lem. There may be multiple issues associated with a specificproblem.

3. Business Impact – this describes the different businessimpacts that are associated with a specific issue.

4. Quantifier – this describes a measurement of the severity ofthe business impact.

5. Periodic Accumulated Impact – this provides a scaled repre-sentation of the actual costs that are related to the businessimpact over a specified time frame, such as the “yearlyimpact” shown in Figure 5.3.We will walk through an example of how the template in

Figure 5.3 can be populated to reflect an example of how invaliddata entry at one point in the supply chain management processresults in impacts incurred at each of three different client appli-cation areas. For each business area, the corresponding impactquantifiers are identified, and then their associated costs areprojected and expressed as yearly incurred impacts.

In our impact matrix, the intention is to document the criticaldata quality problems, so that an analyst can review the specificissues that occur within the enterprise and then enumerateall the business impacts incurred by each of those issues. Oncethe impacts are specified, we simplify the process of assessingthe actual costs, which we also incorporate in the matrix. Theresulting matrix reveals the summed costs that can be attributedto poor data quality.

5.7 Problems, Issues, CausesThe first column of the impact matrix to be filled describes the

problems and the associated data quality issues. Figuring out thepresumptive error that leads to business impacts grounds thelater steps of determining alternatives for remediation.

In our example, shown in Figure 5.4, it had already beendetermined that the source problem is the incorrect introductionof customer identifying information at the data entry point. Theissue, though, describes why it is a problem. Note that there maybe multiple data issues associated with each business problem.

5.8 Mapping Impacts to Data FlawsThe next step is to evaluate the business impacts that occur at

all of the line-of-business applications. These effectively describethe actual pain experienced as a result of the data flaw and provide

Page 39: Reader 36 WS 01 Rbae

Problem Issue Business Impact Quantifier YearlyIncurredImpact

Customercontact name,contact infomisspelled atdata entry point

Inability to clearlyidentify knowncustomers leads toduplication

Figure 5.4 Identifying problemsand their issues.

76 Chapter 5 DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP

greater detail as to why the source problem causes organizationalpain. In our example, as seen in Figure 5.5, there are specific busi-ness impacts within each vertical line of business. These businessimpacts are added to the impact matrix, as shown in Figure 5.6.

These business impacts are the same ones identified usingthe process in section 5.3. Although these are categorized inthe impact matrix in relation to the source problem, it is valu-able to maintain other classifications. For example, the differentareas of shading reflect the application or line of business. Wecould also track how each impact falls into the business impactcategories of chapter 1.

5.9 Estimating the Value GapThe next step is to enumerate the quantifiers associated with

the business impact and calculate a cost impact that can be pro-jected over a year’s time. Realize that not all business impacts are

Page 40: Reader 36 WS 01 Rbae

Data Flaws Incur Business Impacts – An Example

Customer Service

Accts Receivable

Fulfillment

Customercontact name,contact infomisspelled atdata entrypoint

Customer namedoes not matchcustomerdatabase, newrecord insertedwith invalidinformation

1. Increased number of inbound calls2. Increase in relevant call statistics3. Decreased customer satisfaction

4. Lost payments5. Increased audit demand6. Impacted cash flow

7. Increased shipping costs 8. Increased returns processing9. Decreased customer satisfaction

Figure 5.5 Determining theactual business impacts andhow they relate to the sourceproblem.

Chapter 5 DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP 77

necessarily quantified in terms ofmoney. In our example, shown inFigure 5.7, some of the quantifiers are associated with monetaryamounts (e.g., staff time, overdue receivables, increased shippingcosts), whereas others are quantified with other organizationalobjectives (e.g., customer satisfaction, call center productivity). Ifthe quantifier does not specifically relate to a monetary value, wewill document it as long as the impact is measurable.

In the version of the impact matrix in Figure 5.7 we haveidentified hard quantifiers and, based on those quantifiers, somesample incurred impacts rolled up over a year’s time. For exam-ple, the increase in inbound calls resulted in the need for addi-tional staff time allocated to fielding those calls, and thatadditional time was summed up to $30,000 for the year. Auditingthe accounts receivables might show that $250,000 worth ofproducts have been ordered and shipped, but not paid for, animpact on revenues. Products shipped to the wrong location,returned, and reshipped had an average cost of $30, and tookplace 50 times per week, which equals $78,000.

Page 41: Reader 36 WS 01 Rbae

Problem Issue Business Impact Quantifier YearlyIncurredImpact

Customercontact name,contact info misspelled atdata entry point

Inability to clearlyidentify known customers leads toduplication

Increased number ofinbound call centercalls

Increase in relevantcall statistics

Decreased customersatisfaction

Lost payments

Increased auditdemand

Impacted cash flow

Increased shippingcosts

Increased returnsprocessing

Decreased customersatisfaction

Figure 5.6 Adding businessimpacts for each of the issues.

78 Chapter 5 DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP

One of the big challenges is determining the quantifiers andthe actual costs, because often those costs are buried withinongoing operations or are not differentiable from the operationalbudget. One rule of thumb to keep in mind is to be conservative.Documenting hard quantifiers is necessary since they will beused for current state assessment and identification of long-termtarget improvement goals. The objective is to come up withestimates that are both believable and supportable, but most ofall, can be used for establishing achievable performance improve-ment goals. If the numbers are conservatively developed, thechances that changes to the environment will result in measur-able improvement are greater.

We are not done yet; realize that a business case doesn’t justaccount for the benefits of an improvement program – it alsomust factor in the costs associated with the improvements.Therefore, we need to look at the specific problems that are theroot causes and what it would cost to fix those problems. In this

Page 42: Reader 36 WS 01 Rbae

Problem Issue Business Impact Quantifier YearlyIncurredImpact

Customercontact name,contact infomisspelled atdata entry point

Inability to clearlyidentify knowncustomers leads toduplication

Increased number ofinbound call centercalls

Staff time $30,000.00

Increase in relevantcall statistics

Average call duration,throughput, hold time

Decreased customersatisfaction

Call drop rate, re-calls

Lost payments Overdue receivables $250,000.00

Increased auditdemand

Staff time $20,000.00

Impacted cash flow Cash flow volatility

Increased shippingcosts

Increased shipping costs $78,000.00

Increased returnsprocessing

Staff time $23,000.00

Decreased customersatisfaction

Attrition, order reduction(time or size)

Figure 5.7 Quantifiers andestimated costs.

Chapter 5 DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP 79

step, we evaluate the specific issues and develop a set of high-level improvement plans, including analyst and developer stafftime along with the costs of acquiring data quality tools. Wecan use a separate template, the remediation matrix (shown inFigure 5.8), that illustrates how potential solutions solve the coreproblem(s), and what the costs are for each proposed solution.

Figure 5.8 shows an example remediation matrix, documentingthe cost of each solution, which also allows us to allocate theimprovement to the documented problem (and its associatedimpacts). Again, at this stage in the process it may not be necessaryto identify the exact costs, but rather to get a ballpark estimate.

5.10 Prioritizing ActionsBecause multiple problems across the enterprise may require

the same solution, this opens up the possibility for economies ofscale. It also allows us to amortize both the staff and technology

Page 43: Reader 36 WS 01 Rbae

Staffing

Customercontactname,contactinfomisspelledat dataentry point

Inability toclearlyidentifyknowncustomersleads toduplication

Problem Issue Solution ImplementationCosts

Parsing andStandardiza-tion,recordlinkagetools forcleansing

$150,000.00 forlicense

.75 FTE for 1 year

15% annualmaintenance

.15 FTE for annualmaintenance

Figure 5.8 Quantifiers andestimated costs.

80 Chapter 5 DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP

investment across multiple problem areas, thereby further dilut-ing the actual investment attributable to each area of businessimpact.

Essentially, we can boil the prioritization process down tosimple arithmetic:• Each data issue accounts for some conservatively quantifiable

gap in value over a specified time period.• The root cause of each data issue can be remediated with a

particular initial investment plus a continuous investmentover the same specified time period.

• For each data issue calculate the opportunity value as thevalue gap minus the remediation cost.One can then sort the issues by the opportunity value, which

will highlight those issues whose remediation will provide thegreatest value to the organization. Of course, this simplisticmodel is a starting point, and other aspects can be integratedinto the calculations, such as:• Time to value,• Initial investment in tools and technology,• Available skills, and• Learning curve.

Any organization must cast the value within its own com-petencies and feasibility of execution and value. Although thesetemplates provide a starting point, there is value in refining thebusiness case development process to ensure that a valid return

Page 44: Reader 36 WS 01 Rbae

Chapter 5 DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP 81

on investment can be achieved while delivering value within areasonable time frame.

5.11 The Data Quality Road MapWe now have two inputs for mapping out a plan for imple-

menting data quality management program. Pragmatically, theyare the value gap analysis described in this chapter and the dataquality maturity model described in chapter 3. The road mapcombines the two by considering the level of maturity that isnecessary to address the prioritized issues in the appropriateorder of execution.

Though one may aspire to achieve the highest level of matu-rity across all of the data quality framework components, thecomplexity introduced by the different kinds of challenges, com-bined with the oftentimes advisory role played by the data qual-ity manager limits the mandate that can be imposed on theenterprise. Instead, it is desirable to propose a data quality visionthat both supports the business objectives of the organizationyet remains pragmatically achievable within the collaborativeenvironment of the enterprise community. A practical approachis to target a level of maturity at which the necessary benefits ofdata quality management are achieved for the enterprise whilestreamlining the acceptance path for the individuals who willultimately be contributing to the data quality effort.

Given that targeted level of maturity, the next step is to layout a road map for attaining that objective, broken out byphases that have achievable milestones and deliverables. Thesemilestones and deliverables can be defined based on thedescriptions of the component maturity in chapter 4. A typicalimplementation road map will contain five phases:1. Establishing fundamentals2. Formalize data quality activities3. Deploy operational aspects4. Establish level of maturity5. Assess and fine-tune

At the end of the final phase, there is an opportunity to reviewwhether the stated objectives are met and whether it is reason-able to target a higher level of maturity.

For example, consider this road map for attaining level 3 inthe maturity model, which requires establishing the componentsdetailed within levels 2 and 3 in chapter 3. The data quality strat-egy is deployed in five phases, with the objective of each phaseof implementing the best practices that are specified in thedetailed data quality maturity model.

Page 45: Reader 36 WS 01 Rbae

82 Chapter 5 DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP

5.11.1 Establish FundamentalsPhase 1 establishes the fundamental organizational concepts

necessary for framing the transition towards a high quality envi-ronment, with the following milestones:• A framework for collaboration and sharing of knowledgebetween

application manager, business client, and IT practitioners is putin place.

• Technology and operational best practices are identified, col-lected, and distributed via the collaboration framework.

• The relevant dimensions of data quality associated with datavalues are identified and are recognized as relevant by thebusiness sponsors.

• Privacy, security, authorization, and limitation of use policiesare articulated in ways that can be implemented.

• Tools for assessing objective data quality are available.• Data standards are adopted.• There is a process for characterizing areas of impact of poor

data quality.• Data quality rules are defined to identify data failures in

process.

5.11.2 Formalize the Data Quality ActivitiesDuring phase 2, steps are taken to more formally define data

quality activities and to take the initial steps in collaborative dataquality management:• Key individuals from enterprise form a data quality team to

devise and recommend data governance program and policies.• Expectations associated with dimensions of data quality

associated with data values can be articulated.• Simple errors are identified and reported.• Root cause analysis is enabled using data quality rules and

data validation.• Data parsing, standardization, and cleansing tools are

available.• Data quality technology is used for entity location, record

matching, and record linkage.• Data quality impact analysis framework is in place.

5.11.3 Operationalizing Data Quality ManagementMany of the ongoing operational aspects of a data quality

program are put into place during phase 3:• Data governance board consisting of business and IT

representatives from across the enterprise is in place.

Page 46: Reader 36 WS 01 Rbae

Chapter 5 DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP 83

• Expectations associated with dimensions of data qualityrelated to data values, formats, and semantics can bearticulated.

• Standards defined for data inspection for determination ofaccuracy.

• Standardized procedures for using data quality tools for dataquality assessment and improvement in place.

• Data standards metadata managed within participant enter-prises.

• Data quality service components identify flaws early in process.• Data quality service components feed into performance man-

agement reporting.

5.11.4 Incremental MaturationPhase 4 establishes most of the characteristics of the level 3

maturity:• Guiding principles, charter, and data governance are in place.• Standardized view of data stewardship across different appli-

cations and divisions, and stewardship program is in place.• Capability for validation of data is established using defined

data quality rules.• Performance management is activated.• Data quality management is deployed at both participant and

enterprise levels.• Data validation is performed automatically and only flaws are

manually inspected.• Business rule–based techniques are employed for validation.• Guidelines for standardized exchange formats (e.g., XML) are

defined.• Structure and format standards are adhered to in all data

exchanges.• Auditing is established based on conformance to rules

associated with data quality dimensions.• Consistent reporting of data quality management is set up for

necessary participants.• Issues tracking system is in place to capture issues and their

resolutions.

5.11.5 Assess, Tune, OptimizeThe activities at phase 5 complete the transition to maturity

level 3:• Data contingency procedures are in place.• Technology components for implementing data validation,

certification, assurance, and reporting are in place.

Page 47: Reader 36 WS 01 Rbae

84 Chapter 5 DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP

• Technology components are standardized across the enter-prise at the service and at the implementation layers.

• Enterprise-wide data standards metadata management is inplace.

• Exchange schemas are endorsed through data standardsoversight process.

5.12 Practical Steps for Developing theRoad Map

As a practical matter, these steps can be taken to lay out aroad map for building a data quality program:• Assess the current level of data quality maturity within the

organization in comparison with the maturity model describedin chapter 3.

• Determine those data quality issues with material impact.• Articulate alternatives for remediation and elimination of root

causes.• Prioritize the opportunities for improvement.• Assess business needs for processes.• Assess business needs for skills.• Assess business needs for technology.• Map the needs to the associated level of data quality maturity.• Develop a plan for acquisition of skills and tools to reach that

targeted level of maturity.• Plan the milestones and deliverables that address the needs

for data quality improvement.

5.13 Accountability, Responsibility,and Management

Another important aspect of the data quality road mapinvolves resource management, and addressing the challengeof coordinating the participants and stakeholders in a data qual-ity management program is knowing where to begin. Often, it isassumed that starting an initiative by assembling a collection ofstakeholders and participants in a room is the best way to begin.Before sending out invitations, however, consider this: withoutwell-defined ground rules, these meetings run the risk of turninginto turf battles over whose data, definitions, business rules, orinformation services are the “correct” ones.

Page 48: Reader 36 WS 01 Rbae

Chapter 5 DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP 85

Given the diversity of stakeholders and participants (and theirdiffering requirements and expectations), how can we balanceeach individual’s needs with the organization’s drivers for dataquality? There are a number of techniques that can help inorganizing the business needs in a way that can in turn managethe initial and ongoing coordination of the participants. Theseinclude establishing processes and procedures for collaborationbefore kickoff, developing ground rules for participation, andclarifying who is responsible, accountable, consulted, andinformed regarding the completion of tasks.

5.13.1 Processes and Procedures for CollaborationAssembling individuals from different business areas and

applications will expose a variety of opinions about the names,structures, definitions, sources, and reasonable uses for dataconcepts used across the organization. In fact, it is likely thatthere is already a lengthy corporate experience regarding thedefinition of common terms (e.g., “what is a customer?”), andto reduce replication of effort, take the time to establish rulesfor interaction in the context of a collaborative engagementwhere the participants methodically articulate their needs andexpectations of their representative constituencies. The processshould detail the approach for documenting expectations andprovide resolution strategies whenever there are overlaps orconflicts with respect to defining organizational business needs.

5.13.2 Articulating Accountability: The RACI MatrixIn chapter 2 we discussed characteristics of the participants

and stakeholders associated with a data quality management pro-gram. To ensure that each participant’s needs are addressed andthat their associated tasks are performed appropriately, theremust be some delineation of specific roles, responsibilities, andaccountabilities assigned to each person. One useful model isthe RACI (Responsible, Accountable, Consulted, and Informed)model. A RACI model is a two-dimensional matrix listing tasksalong the rows and the roles listed along the columns. Eachcell in the matrix is populated according to these participationtypes:• R if the listed role is responsible for deliverables related to

completing the task;• A if the listed role is accountable for delivering the task’s

deliverables or achieving the milestones;

Page 49: Reader 36 WS 01 Rbae

86 Chapter 5 DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP

• C if the listed role is consulted for opinions on completing thetask; or

• I if the listed role is informed and kept up to date on the prog-ress of the task.Figures 5.9 and 5.10 provide a sample RACI matrix associated

with some of the data quality processes described in chapter 2.Again, this template and assigned responsibilities is a startingpoint and is meant to be reviewed and refined in relation tothe roles and relationships within your own organization.

5.14 The Life Cycle of the Data QualityProgram

At the beginning of a data quality initiative, there may seemto be a never-ending list of issues that need to be addressed,and as a team works its way through this list, you will find thattwo interesting counterintuitive phenomena will become clear.The first is that tracking down and fixing one reported issueoften results in the correction of some other problems reportedto the list. The other is that even though you eliminate someproblems, as these issues are resolved, new issues will emergefrom the existing test suites.

Sitting back and thinking about this provide some insight intothe process, and ultimately suggests an interesting idea aboutplanning for any quality management program. There are goodexplanations for both of these results, and examining the life cycleof the quality management process should help in developing awinning argument for the support of these programs.

Consider the first by-product, in which fixing one problemresults in other problems mysteriously disappearing. Apparently,even though more than one issue is reported, they all share thesame root cause. Because the people reporting the issue onlyunderstood the application’s functionality (but did not have adeep knowledge of how the underlying application was designedor how it worked), each issue was perceived to be separatewhenever the results or side effects differed. Yet when issuesshare the same root cause, the process of analyzing, isolating,and eliminating the root cause of the failure also eliminates theroot cause of the other failures. The next time you evaluate theerrors, the other issues sharing the same root cause will no lon-ger fail.

The second by-product is a little less intuitive, because onewould think that by finding and fixing problems, the resultshould be fewer issues, when in fact it is likely to result in more

Page 50: Reader 36 WS 01 Rbae

SeniorManager Business Client

ApplicationOwner

Datagovernance

manager

Data qualitymanager Data steward

Data qualityanalyst

Metadataanalyst

Systemdeveloper

Operationsstaff

CIRCCIABusiness impact analysis

Data qualityrequirements analysis

CICCRCIA

Data quality assessment – Bottom-up

I CI I A I R I CI C

A CI I I I R I I CI

Engage business data consumers

CI CI A R C C C CI

A CI R C C C C CI CI

Define data qualitymetrics CCCRCICIA

Set acceptabilitythresholds CCRCICIA

Data standardsmanagement CRCCACI

Active metadatamanagement CIRCCACI

IA CI I R C C C

Data quality inspectionand monitoring I I A I R

CICICCCRCICIADQ SLA

Define data validity rules

Data quality assessment–Top-down

Define, review, prioritizeDQ measures

Figure 5.9 Sample data quality RACI matrix – part 1

Chapter5

DEV

ELOPINGABUSINES

SCASEANDADATA

QUALITY

ROADMAP

87

Page 51: Reader 36 WS 01 Rbae

Senior Manager Business ClientApplication

OwnerData governance

managerData quality

managerData steward

Data qualityanalyst

Metadataanalyst

Systemdeveloper

Operationsstaff

Enhanced SDLCfor DQ

Data qualityissue reporting

IRCIACICI

Data qualityissue tracking

IICRIA

Root causeanalysis

CIRACI

CICIRAICICI

Processremediation

Data correction

CICICCIIAI

Datastandardizationand cleansing

CCAICI

Identityresolution

ICICCRCICIA

Dataenhancement

A I CI R C C C C CI

A R I C C C C CI

Figure 5.10 Sample data quality RACI matrix – part 2

88Chapter5

DEV

ELOPINGABUSINES

SCASEANDADATA

QUALITY

ROADMAP

Page 52: Reader 36 WS 01 Rbae

Chapter 5 DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP 89

issues. What actually happens is that fixing one reported prob-lem enables a test to run past the point of its original failure,allowing it to fail at some other point in the process. Of course,this (and every other newly uncovered) failure will need to bereported to the issue list, which will initially lead to an even lon-ger list of issues.

Rest assured, though, that eventually the rate of the discoveryof new issues will stabilize and then decrease, while at the sametime the elimination of root causes will continue to shortenthe list of issues. If you prioritize the issues based on their rela-tive impact, as more problems are eliminated, the severity ofthe remaining issues will be significantly lower as well. Atsome point, the effort needed to be expended on researchingthe remaining issues will exceed the value achieved in fixingthem, and at that time you can effectively transition into proac-tive mode, decreasing your staffing needs as the accountabilityand responsibility is handed off to the application owners.In other words, this practical application of the Pareto principledemonstrates how reaching the point of diminishing returnsallows for better resource planning while reaping the mosteffective benefits.

There are some lessons to be learned with respect to dataquality issue analysis:1. Subjecting a process to increased scrutiny is bound to reveal

significantly more flaws than originally expected.2. Initial resource requirements will be necessary to address

most critical issues.3. Eliminating the root causes of one problem will probably fix

more than one problem, improving quality overall.4. There is a point at which the resource requirement diminishes

because the majority of the critical issues have been resolved.These points suggest a valuable insight that there is a life

cycle for a data quality management program. Initially there willbe a need for more individuals focusing a large part of their timein researching and reacting to problems, but over time therewill be a greater need to have fewer people concentrate someof their time on proactively preventing issues from appearingin the first place. In addition, as new data quality governancepractices are pushed out to others across the organization, thetime investment is diffused across the organization as well, fur-ther reducing the need for long-term dedicated resources.Knowing that the resource requirements are likely to be reducedover time may provide additional business justification to con-vince senior managers to support establishing a data qualityprogram.

Page 53: Reader 36 WS 01 Rbae

90 Chapter 5 DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP

5.15 SummaryThe life cycle of the data quality management program

dovetails well with the maturity model described in chapter 3.The lower levels of the maturity model reflect the need forreacting to data quality issues, while as the organization gainsmore expertise, the higher levels of maturity reflect more insightinto preventing process failures leading to data issues.

As a practical matter, exploring areas of value for developing asuccessful business case will help in mapping out a reasonableand achievable road map. Consider an initial exercise thatinvolves working with some senior managers to seek out those“house on fire issues,” namely by following these steps asreviewed in this chapter:1. Identify five business objectives impacted by the quality of

data2. For each of those business objectives:

a. Determine cost/impacts areas for each flawb. Identify key quantifiers for those impactsc. At a high level, assess the actual costs associated with that

problem3. For each data quality problem:

a. Review solution options for that problemb. Determine costs to implement

4. Seek economies of scale to exploit the same solution multipletimesAt the conclusion of this exercise, you should have a solid

basis of information to begin to assemble a business case thatnot only justifies the investment in the staff and data qualitytechnology used in developing an information quality program,but also provides baseline measurements and business-directedmetrics that can be used to plan and measure ongoing programperformance.


Recommended