Breaking the Knowledge Acquisition Bottleneck Through ... · Bazaar style development, how-ever, occurs over the Internet in constant pub-lic view. Raymond identified principles of

70 Information Resources Management Journal, 19(1), 70-83, January-March 2006

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.

Breaking the Knowledge AcquisitionBottleneck Through Conversational

Knowledge ManagementChristian Wagner, City University of Hong Kong, Hong Kong,

and Claremont Graduate University, USA

ABSTRACT

Much of today’s organizational knowledge still exists outside of formal information repositoriesand often only in people’s heads. While organizations are eager to capture this knowledge,existing acquisition methods are not up to the task. Neither traditional artificial intelligence-based approaches nor more recent, less-structured knowledge management techniques haveovercome the knowledge acquisition challenges. This article investigates knowledge acquisitionbottlenecks and proposes the use of collaborative, conversational knowledge management toremove them. The article demonstrates the opportunity for more effective knowledge acquisitionthrough the application of the principles of Bazaar style, open-source development. The articleintroduces wikis as software that enables this type of knowledge acquisition. It empiricallyanalyzes the Wikipedia to produce evidence for the feasibility and effectiveness of the proposedapproach.

Keywords: knowledge acquisition; knowledge artifacts; knowledge management; opensource development; wiki

INTRODUCTIONEver since the development of artificial

intelligence (AI) and expert systems, there hasbeen the promise of capturing an organization’sknowledge on a large scale and making it avail-able to the entire organization. Unfortunately,these promises did not materialize (Buchanan& Smith, 1988; Ullman, 1989). While there havebeen several early success stories, such as

American Express’ Credit Advisor or Digital’sExpert Configurer (XCON), attempts to acquirethe broad knowledge of organizations havebeen less fruitful. More than a decade later, adecidedly optimistic survey by Frappaolo andWilson (2003) found that no more than 32% ofthe knowledge was available in computerizedform. Obviously, knowledge acquisition is achallenge. How can we extract more of the ex-

Information Resources Management Journal, 19(1), 70-83, January-March 2006 71


isting knowledge from organizational sources,especially from people? And how can we man-age the maintenance so as to assure that thestored knowledge is accurate and up-to-date?Discovering answers to these questions is im-portant for organizations as information workbecomes knowledge work, thus requiringknowledge to support non-routine decisionmaking (Drucker, 1993, 1999). It is similarly im-portant for organizations whose corporate por-tals that were set up years ago increasingly arebecoming dated and stale (Newcombe, 2000).Furthermore, it is important for organizations inthe business of creating knowledge assets whoare faced with increased costs of knowledgecreation, shorter knowledge life cycles, and in-creased knowledge obsolescence.

Seeking a solution to the problems of or-ganizational knowledge acquisition, the articlemakes the following argument. First, it intro-duces previous approaches to knowledge ac-quisition, identifies four limitations, and of-fers evidence for these limitations. The articlethen refers to Bazaar style (software) devel-opment (Raymond, 2001) as a potential direc-tion for knowledge asset creation. It then ex-plains the concept of conversational knowl-edge management and advocates wiki tech-nology and the “wiki way” (Leuf &Cunningham, 2001) as a possible approach tousing Bazaar-style methods in conversationalknowledge management. An empirical analy-sis of the viability and effectiveness of theapproach follows. The article ends with impli-cations and conclusions about the future ofconversational knowledge management.

KNOWLEDGE ACQUISITION

Approaches toKnowledge Acquisition

Organizations that try to acquire organi-zational knowledge formally (based on artificialintelligence methods) have relatively few avail-able alternatives. For application areas with largeamounts of transaction data, data mining caninduce rules from that data. Data mining solu-

tions work well for high-volume applicationssuch as credit approval. Even then, the knowl-edge creation effort is highly resource-inten-sive (Lee, 2001). When insufficient data vol-umes thwart data mining efforts, the acquisi-tion activity has to elicit knowledge directlyfrom experts as rules and facts or similar formalrepresentations. This should be done under theguidance of knowledge engineers trained inknowledge elicitation, formalization, and repre-sentation. Yet a knowledge engineer’s produc-tivity is limited to hundreds of rules per year fordevelopment and maintenance (Sviokla, 1990;Turban & Aronson, 2000). This productivitylevel may be acceptable for high value-addedprojects but limits the broad applicability of theapproach. Smaller projects have attempted torely on capturing knowledge without knowl-edge engineers, relying on end-user develop-ment. The latter has not been very successful(Wagner, 2000, 2003). Wagner found end-userexpert systems often to be poorly structured,incomplete, highly coupled, and thus, difficultto maintain. Artificial intelligence-based meth-ods thus are facing considerable applicabilityconstraints. Consequently, organizationalknowledge management efforts have soughtto capture knowledge in less formal ways; forinstance, by extending document managementand groupware systems into knowledge man-agement systems (Davenport & Prusak, 1998;Holsapple & Joshi, 2002) in part through betterindexing, search engines, and linking.

Yet challenges remain. When organiza-tions try to make sense out of large volumes ofdocuments in their document management sys-tems, they usually need search engines, textmining, and automatic indexing tools, resultingin an expensive solution with limited success(Bygstad, 2003). Furthermore, this approach iswell suited only for relatively stable and cen-tralized knowledge bases. Users of such knowl-edge bases often encounter information over-load, irrelevant responses, or no response toqueries. Alternatively, organizations might useexpert reports and harvest expert knowledge tocapture the methods used by domain experts(Snyder & Wilson, 1998). Again, this method



often is limited to niche applications, requiresconsiderable effort, and still faces knowledgemaintenance difficulties (Malhotra, 2000). Othersolutions, such as corporate controlled portals,can quickly suffer from outdated knowledgeand lack of maintainability (Newcombe, 2000).

Knowledge Acquisition BottleneckIn summary, we can describe the knowl-

edge acquisition bottleneck as follows (Wagner,2000; Waterman, 1986):

• Narrow bandwidth. The channels that existto convert organizational knowledge fromits source (either experts, documents, ortransactions) are relatively narrow.

• Acquisition latency. The slow speed of ac-quisition frequently is accompanied by a de-lay between the time when knowledge (orthe underlying data) is created and whenthe acquired knowledge becomes availableto be shared.

• Knowledge inaccuracy. Experts make mis-takes and so do data mining technologies(finding spurious relationships). Further-more, maintenance can introduce inaccura-cies or inconsistencies into previously cor-rect knowledge bases.

• Maintenance trap. As the knowledge in theknowledge base grows, so does the require-ment for maintenance. Furthermore, previ-ous updates that were made with insuffi-cient care and foresight (“hacks”) will accu-mulate and render future maintenance in-creasingly more difficult (Land, 2002).

Given these challenges, it appears thatthere are few opportunities for breaking theknowledge acquisition bottleneck. The nextsection will propose one possible remedy.

LEARNING FROMSOFTWARE DEVELOPMENT

One area that has offered lessons for thesuccessful creation of knowledge assets is soft-ware development, and specifically open sourcesoftware development by distributed teams of

volunteers. Open source projects engage soft-ware developers, wherever they may reside, andhave them collaboratively develop the knowl-edge asset (the software). Surprisingly, thisactivity takes place with little centralized man-agement. Raymond (2001) characterized thisapproach to software development as the Ba-zaar style in contrast to the traditional cathe-dral style of development. Cathedral is a meta-phor for the development of a large monolithicartifact through a structured and lengthy de-velopment process. Fundamental to the ca-thedral style approach is that source code isonly widely available at release dates with ac-cess restricted to a few developers betweenrelease dates. Bazaar style development, how-ever, occurs over the Internet in constant pub-lic view. Raymond identified principles of thisdevelopment style that challenge the assump-tion that large and complex software assetsneed to be built via an a priori, centralized ap-proach. Overall, four themes guide this devel-opment approach, which can be characterizedas follows: (1) design simplicity of the artifact,(2) team work, (3) frequent creation of a visiblework product, and (4) development as an on-going conversation. This section introduces aframework of open source (software) develop-ment, identifies its benefits, and derives les-sons about the applicability for knowledge as-sets other than software.

Open Source Software DevelopmentOpen source software development, as

described, for instance, by Raymond (2001),Benkler (2002), and Markus, et al. (2000), relieson several factors to achieve success (and thus,performance of the knowledge creation effort).Key success factors (see Figure 1) consist of asuitable artifact, a skilled and motivated teamof volunteer users and developers, a lean andtransparent development process, and light-weight but effective governance. Added to thisis an enabling factor; namely, an appropriatetechnology infrastructure, which, for instance,permits frequent releases, accommodates vot-ing mechanisms to govern the community, orenables fast and reliable version management,



all with little overhead and few transaction costs.With these factors in place, open source soft-ware development promises faster developmentspeed than proprietary approaches (includinghigher developer productivity) and a betterquality product, which is also free.

Open source software development hashad remarkable successes, creating softwarethat appears to break long-standing rules ofsoftware evolution (Scacchi, 2004). For example,open source software size has been shown togrow super-linear (exponential) rather than lin-ear or inverse-square (Mockus et al., 2002).

Bazaar-StyleKnowledge Management

Can Bazaar-style development be appliedsuccessfully to the creation of knowledge as-sets other than software? Several leaders ofthe open source community have hypothesizedthis, including Torvalds (Hamm, 2004). YetTorvalds also acknowledged that not all knowl-edge assets are equally suitable, as the cre-ation process may be too personal or too linear.Hence, in order to extend the lessons and ben-efits of Bazaar-style development, we shouldtarget applications where the core themes can

be applied: (1) simplicity of design and frequentredesign (refactoring) to maintain simplicity, (2)teamwork (3) frequent creation of a small workproduct available for review and testing, and(4) development as conversation to facilitateback-up, clarity, and shared understanding.Applications of this kind exist within organiza-tions, and among organizations and people. Forexample, companies could conceivably turntheir traditional help desks into open help desks,where customers would openly share their prob-lems with others, help each other, and free upcompany experts to tackle only the most diffi-cult problems. Unfortunately, companies fre-quently do not want to relinquish control oftheir (closed) help desk. Open help desks existon the Web, typically as discussion forums ofquestions and answers. While they embodyteamwork and conversation, the resulting workproduct often is not simple and well-structuredbut lacks organization and is filled with repeti-tion and inconsistencies.

Consequently, one necessary conditionfor this research was to find a knowledge assetthat was highly amenable to the Bazaar-styledevelopment approach and that used a tech-nology that facilitated this type of development.

Figure 1. Framework for open source software development

Technology

Artifact • “Value proposition” • Credible core • Modular design • Standards based

Group • Core team • Skilled individuals • Voluntary participation • User / developer duality

Process • Frequent, early releases • Continuous peer review • Parallel, distributed

development • Version management • Low participation overhead • Full disclosure of technology

functions

Governance • Lightweight, but with

operational, collective choice, and constitutional choice rules

• Free redistribution of the work product and its derivatives

• Open license

Performance

• Fast development • High quality • Free product



The selected asset was an online encyclopedia— Wikipedia (wikipedia.org) — that employswiki technology and the “wiki way” of knowl-edge asset creation (Leuf & Cunningham, 2001).The article will provide more detail on theWikipedia application, following a briefing onknowledge management with wiki technologyand the wiki way.

CONVERSATIONALKNOWLEDGE MANAGEMENTWITH WIKIS

Knowledge management with wikis hasrecently drawn media attention (Brown, 2004;Hof, 2004; Ripley, 2003) as a new, end user de-veloped approach founded on collaborationand conversation. Collaborative knowledgemanagement means that many people work to-gether to create or acquire knowledge insteadof a few individual experts. In other words, acommunity (of practice) will jointly create andmaintain the knowledge. Research elsewhere(Cheung et al., 2005) suggests that conversa-tional knowledge management is well suited forthis challenge, whereby conversations (i.e.,questions and answers) become the source ofrelevant knowledge.

Conversational knowledge managementhas become popular in communities that formaround discussion boards. Leading solutionssuch as ezboard or Yahoo groups are now usedby millions of communities1. Yet while discus-sion forums have been a simple and practicalsolution to share knowledge through conver-sation, they lack several useful knowledge rep-resentation and maintenance features. For ex-ample, discussion forum postings, even withina single thread, often do not build upon eachother. As a result, the latest post may not be anincremental improvement of earlier ones. Analternative technology, which combines themost desirable features of other conversationaltechnologies, is the wiki. This section discusseswiki technology and its suitability for knowl-edge management.

Wiki Structure and PrinciplesA wiki is a set of linked Web pages cre-

ated through the incremental development bya group of collaborating users (Leuf &Cunningham, 2001) as well as the software usedto manage the set of Web pages. WardCunningham developed the first wiki in 1995 asthe PortlandPatternRepository in order to com-municate specifications for software designwithin a large, heterogeneous community. Theterm wiki (from the Hawaiian wikiwiki, mean-ing fast) references the speed with which con-tent can be created with a wiki. Wiki key char-acteristics are as follows:

• It enables Web documents to be authoredcollectively;

• It uses a simple markup scheme (usually asimplified version of HTML, althoughHTML frequently is permitted);

• Wiki content is not reviewed by any editoror coordinating body prior to its publica-tion; and

• New Web pages are created when users navi-gate a hyperlink that points nowhere.

Underlying these characteristics are spe-cific principles that have shaped wiki softwareas well as its use. They are intended to producea development environment where multiplepeople easily can create and modify a set ofjointly owned Web pages. Wiki pages are ex-pected to be open, incrementally developed,and organic; require little markup; have consis-tent edit functions and clear naming, be heavilyhyperlinked and easily observable (found). Asa result, wiki pages are expected to change andimprove incrementally.

Wikis in Use

Creating Wiki PagesCreating and editing wiki pages is a simple

activity. A wiki author will use a Web-enabledformfield to enter a comment he or she wishes topublish. Authors can use plain text or a simpli-fied markup language. The system then auto-



matically generates and publishes a Web pagewith a unique URL that can be indexed and linkedto. Hence, users with virtually no Web publish-ing knowledge can create Web content about asquickly as they can write a text document.

Linking Wiki PagesA fundamental aspect of knowledge man-

agement with wikis is the use of simplehyperlinks. Hyperlinks link topics and createcontext. Wikis drastically simplify hyperlinking.To link pages within a wiki, users do not haveto create and use URLs (although they can).Instead, they normally use CamelCase (multiplewords capitalized and concatenated) or doubleparentheses around a term ([[term]]) in order tocreate a link. Links whose destination (page)does not exist are depicted as question marks(or similar) as if the author were asking a ques-tion. Another author (or the original creator)then can respond by clicking on the questionmark, thus navigating the hyperlink to a newpage and invoking an editor to write that page.Upon completion of the edit, the question markautomatically will be rendered as a regularhyperlink (now underlined text) pointing to thenew page.

VersioningAs multi-user systems, wikis enable ev-

ery user to modify any other user’s Web pages(unless explicitly forbidden by access right set-tings). This creates challenges in version man-agement. Wikis solve them by keeping priorversions of any Web page in memory, and en-abling rollback, comparison, difference identi-fication, and similar capabilities, if so desired.Wikis also track the history of prior changeswith author, date, and related information.

Wikis and Open Source PrinciplesKnowledge management using wikis

and the wiki way (see, for instance,“WhyWikiWorks” at http://c2.com/cgi/wiki?WhyWikiWorks) appear to bear consid-erable resemblance to open source softwaredevelopment, described in part by the follow-ing traits:

• Sense of responsibility in contributing to acommon good;

• Openness to change and modification byanyone;

• Meritocracy (anyone can play, but onlygood players last);

• Self-governance of the developer team;

• Task decomposition and incremental devel-opment;

• Use of technology for communication andcoordination, as well as norms for their use,including objectivity (neutral point of view);and

• Ease of use for knowledge creation and main-tenance.

Thus, as an enabling technology, wikisestablish an environment to develop the rightartifact, to use a Bazaar-style process, to en-gage teams in voluntary collaboration, and togovern the effort with a lightweight structure(Figure 1), thus offering the potential for opensource knowledge management. In open sourcesoftware development, the corresponding re-sults are ultimately lower error rates (comparedto closed source); fast(er) development speed;and the ability to develop large(r) applications,accelerated development, and high(er) maintain-ability of the source code (Mockus et al., 2002).Whether these same benefits accrue in wiki-enabled open source knowledge managementmust be determined empirically.

ASSESSING CONVERSATIONALKNOWLEDGE MANAGEMENT

Can principles of Bazaar-style develop-ment be applied to knowledge management, andif so, will they improve knowledge acquisitioneffectiveness? To begin to answer these ques-tions, the research analyzed a single case of wiki-enabled knowledge asset creation — Wikipedia.

Knowledge Asset:Wiki-Based Encyclopedia

Encyclopedias reasonably can be char-acterized as knowledge assets. While one may



debate how much of their content is informa-tion instead of knowledge, encyclopedias con-tain insights (factual), rules (inferential), prin-ciples (inferential), and so forth. They also fitthe definition of information in context (Dav-enport &Prusak, 1998), since they frequentlylink concepts to other concepts (cross-refer-encing). By design, encyclopedias also arerelatively loosely coupled knowledge assets,whose components (articles) can exist inde-pendently. Encyclopedias frequently are com-piled from the work of a group of authors whoknow little about each other or each other’swork. Encyclopedia articles have commonstructural elements, since all articles are defi-nitions. They typically also follow some stan-dards for articles of a similar type (e.g., all biog-raphies are structured similarly and differentfrom city descriptions).

The majority of digital encyclopedias,such as Britannica, Encarta, Compton, or Grolier,is closed source. They are compiled by a rela-tively small group of commissioned writers andeditors. The result of their work only becomesavailable to the readership once the entire editprocess has been completed and the new en-cyclopedia version is released. Yet, because oftheir loosely structured nature, encyclopedias(and other, similar knowledge assets) also canbe created in Bazaar style, given certain condi-tions. The work product cannot be an off-lineproduct such as a book or a CD; the technologyin general has to be amenable to Bazaar-styleknowledge acquisition and representation, andthe organization creating the encyclopedia hasto formulate procedures and methods that en-able this type of knowledge acquisition. Bazaar-style knowledge acquisition, therefore, becomesa possibility when the asset is created followingthe wiki way. Hence, Wikipedia, the online ency-clopedia developed as a wiki, was used as theknowledge asset to be analyzed for this research.Wikipedia is one of several knowledge productsdeveloped over the last few years with wiki tech-nology and the wiki way of development. Otherapplications include Wikitravel and Wikibooks.Development of Wikipedia began in 2001. Asof May 2004, less than three and a half years

later, the (English) Wikipedia contains about280,000 articles.

Wikipedia, applying wiki principles, ap-pears to enable its developers to use a Bazaar-style approach. Specifically, writers can makeincremental changes and then commit and pub-lish them immediately. Also, articles can be writ-ten by numerous writers as joint authors, thusbuilding on the work of others or correctingmistakes. Furthermore, Wikipedia rules stressan authoring etiquette that incorporates rulesof article design and redesign targeted towardsimple and clear articles. In other words, it ispossible for Wikipedia authors to follow themain themes of Bazaar-style development.Whether authors do so and whether the out-come of their efforts is consistent in its effec-tiveness with Bazaar-style software develop-ment needs to be determined empirically.

Research QuestionsThe research sought to address two ques-

tions through empirical analysis.

1. Is conversational knowledge management,as demonstrated in Wikipedia, consistentwith Bazaar-style knowledge asset creation?

2. Is conversational knowledge management,as illustrated by Wikipedia, able to achievethe benefits of Bazaar-style development?

The research thus needed to determinewhether “Wikipedians” would follow Bazaar-style knowledge acquisition and whether theeffect would be improved knowledge acquisi-tion. Based on the criteria in Figure 1, numer-ous questions would have to be addressed.Yet, as Table 2 illustrates, compliance with themajority of criteria was confirmed fromWikipedia information (Wikipedia Web site andWikimedia Meta-Wiki), leaving four core ques-tions to be answered.

Thus, the research questions focusedon the incremental nature of the knowledgeacquisition effort, the multi-person effort, andthe effect on the growth and quality of thework product, as described in the followingsubsections.



Incremental Developmentwith Frequent Releases

Incremental development and frequent re-leases are fundamental to Bazaar-style develop-ment. Would Wikipedians follow this approach,or instead would they prefer to write an authorita-tive article in an effort burst with few revisions inthe process and even fewer thereafter?

To answer this question, the researchexplored (1) the frequency of article edits and(2) the change in article size. If the effort werenon-incremental, one would expect a relativelyshort development period of high activity (sincean article is typically a few hundred to a fewthousand words long) followed by little editingactivity thereafter, possibly with some mainte-nance and some extensions. An incrementaleffort, in contrast, would result in a high levelof activity with many edits during an extendeddevelopment period followed by a much-ex-tended maintenance period with lower yet stillconsiderable update efforts. To operationalizethe assumption, the research adopted the

Pareto rule, thus hypothesizing that if Wikipediaarticles were written in a non-incremental ef-fort, then 80% of their size growth and 80% ofthe edit efforts should occur during the first20% of their existence:

H1: Wikipedia articles are the outcome of anincremental development, and therefore,their growth and edit pattern does notfollow the 80-20 Pareto rule.

Multi-Person EffortThere is little doubt that Wikipedia is a

multi-person effort with presently more than7,000 people contributing to it and more than500 people making more than 100 contributionseach per month (see Wikistats at http://w w w. w i k i p e d i a . o r g / w i k i s t a t s / E N /TablesWikipediaEN.htm). However, accordingto the principles of Bazaar-style development,one would expect Wikipedia development tobe a team effort at a more detailed level withmultiple authors working on each article in order

Table 2. Wikipedia fit with Bazaar-style development criteria

Dimension Bazaar-Style Development Wikipedia Adaptation “Value proposition” Yes – create free and open encyclopedia Credible core Yes / No – not a content core, but a developer group core Modular design Yes – loosely coupled articles

Artifact

Standards based Yes – article structures Core team ? – Is Wikipedia development a team effort? Skilled individuals Yes/No – participants from the Nupedia initiative all had

PhD degrees, but no control over new participants Voluntary participation Yes – only one paid chief editor, Larry Sanger (until

2002)

Group

User / Developer duality Yes – author and users Frequent, early releases Continuous peer review

? – Are Wikipedia articles developed through an incremental approach with continuous releases?

Parallel, distributed development Yes – 7,000 authors worldwide Version management Yes – through wiki technology Low participation overhead Yes – through wiki technology

Process

Full disclosure of (technology) functions

Yes – through wiki technology

Lightweight, operational, collective choice

Yes – Wikimedia organization with meritocracy as governing structure

Free redistribution of the work product

Governance

Open license

Yes – GPL license

Fast development ? – Does conversational knowledge management result in linear or better growth of knowledge assets?

High quality ? – Does conversational knowledge management result in improved knowledge asset quality?

Performance

Free product Yes – GPL license



to extend it and possibly to correct mistakes.This would reflect one of the key themes of opensource, also called Linus’ [Torvalds] Law;namely, that “given enough eyeballs, all bugsbecome shallow.” Hence, the research sought todetermine whether enough eyeballs were scruti-nizing each article, at least more than two. Hence,the analysis focused on whether article publica-tion and maintenance was a multi-person effort.

H2: Knowledge acquisition and maintenancein individual Wikipedia articles is a multi-person effort.

EffectivenessThe research sought to determine

whether encyclopedia development adoptingthe wiki way would be effective. In this explor-atory study, effectiveness was measuredthrough two variables; namely, (1) growth ofthe knowledge asset and (2) quality improve-ment efforts. Growth of the knowledge assetwas determined, based on the increase in thenumber of articles in the Wikipedia over time.In line with other open source successes(Mockus et al., 2002), the expectation was thatgrowth would be linear or better (super-linear).

H3: Wikipedia growth in terms of number ofarticles will be linear or super-linear.

Unable to assess the overall quality ofthe Wikipedia objectively vis-à-vis other ency-clopedias, the research focused on processquality and specifically quality improvementefforts. These efforts were operationalized bythe ratio of edit efforts vs. the growth ofWikipedia articles. In other words, the researchtested whether editing efforts were devoted toincreasing the size of articles or to refining ex-isting articles. The assumption was that refine-ments (without significant increase in size)would improve overall quality, for instance,through an increase in presentation quality,content quality, or the inclusion of more view-points (diversity).

Hence, we computed a words-per-editratio based on the number of words (per article)

written and the number of edits it took to createthe article version. This ratio was calculated forarticles in their early stages (20% of develop-ment effort) and at their present state. Decreas-ing ratios would indicate more effort being spentover time on article refinement. To exclude in-significant edits, the research only considerednon-minor changes (counted separately inWikipedia). The expectation was that, over time,more effort would be devoted to increased ar-ticle quality. It is a stated Wikipedia goal toincrease quality as articles mature (see, for in-stance, the reply to objections concerningWikipedia, which discusses quality and growthissues, at http://en.wikipedia.org/wiki/Wikipedia:Replies_to_common_objections).

The corresponding hypothesis concern-ing quality improvement was as follows:

H4: Edit effort targeted at quality improve-ments for individual Wikipedia articleswill increase over time, demonstrated byreduced article growth per edit.

Data SourceWikipedia is an open encyclopedia in

many ways. In addition to articles being freelyaccessible, so is the history of their creation,including dates, content of each version, andauthor information. Hence, it was possible totrace changes, change frequencies, and authorcontributions. To address the first two ques-tions, 80 articles were randomly selected withthe one qualification that 40 of them had to becreated originally in 2001 and 2002. More re-cent articles were ignored because of their shorthistory. To determine knowledge asset growth,Wikipedia summary statistics were accessed,which logged the number of articles writteneach month from the start of Wikipedia.

Results

Incremental Development(Release Early and Often)

This analysis focused on two samples of40 articles from 2001 and 2002. For both of thesesamples, the edit efforts for the first 20% of



each article’s existence (up to the measurementpoint in March 2004) were compared againstthe entire development effort. The results donot support the notion of a short effort burstbut one of incremental development, as shownin Table 3.

For articles started in 2001, the first 20%of an average article’s existence accounted forabout 34% of the article’s size (793 words outof 2,319) and less than 6% of its edits (17 out of288). For the 2002 articles, the first 20% ac-counted for about 36% of article size and 15%of article edits. Overall, this was considerablyless than expected according to the Pareto rule.All results are highly significant (p < 0.0001).Hence, size grew relatively incrementally with asomewhat larger upfront effort (about 35% ofsize produced in 20% of the time). Wikipediaedits were even more incremental with a dis-proportionately small number during the earlyexistence of an article (15% or less of the editsin 20% of the time).

Multi-Person EffortEach of the 80 articles in the two samples

also was evaluated according to the number ofauthors. None of the articles in the sample wasco-authored by fewer than 18 people, and themaximum number of authors for any article was285. On average, more than 96 authors workedon an article (Table 4).

Given these results, what is the likelihoodthat articles overall were predominantly single-authored? Virtually none. A t-test showed verysignificant differences between the actual au-thor numbers and the possibility of single au-thorship. This is a strong result, yet the readeris reminded that the sample articles were oldarticles. More than half of the Wikipedia ar-ticles were less than 12 months old (as of June2004) and will have been edited by fewer people.An additional sample of 40 randomly selectedarticles started in 2003, though, still corrobo-rated the results (average of 48 authors, t =7.164, p = 0.0000). In other words, as timeprogresses, Wikipedia articles are scrutinizedby “many eyeballs”.

Wikipedia GrowthData points concerning the growth of

Wikipedia illustrate dramatic growth. AlthoughWikipedia has existed since 2001, more thanhalf of its approximately 280,000 articles (En-glish articles as of May 2004) were written sinceJune 2003. (See http://www.wikipedia.org/wikistats/EN/TablesWikipediaEN.htm).

To explore the growth pattern further, theanalysis targeted the numbers of new articlescreated each month. Three different time serieswere compared: number of articles, log of num-ber of articles, and square root of number ofarticles. For each series, the fit was computed

Table 3. Wiki development activity

20% Avg. Actual

Avg. Expected (80-20 Rule)

t (df = 39) Significance p

2001 Articles, Size (Words) 793 1,855 6.468 0.0000 2001 Articles, Edits 17 230 8.841 0.0000 2002 Articles, Size (Words) 811 1,795 4.212 0.0000 2002 Articles, Edits 24 133 6.820 0.0000

Table 4. Wikipedia article author statistics

Min. No. Authors

Max. No. Authors

Avg. No. Authors

t-Statistic Significance p

2001 Articles 33 285 121.4 10.33 0.0000 2002 Articles 18 268 70.8 7.870 0.0000 All Articles 18 285 96.1 12.21 0.0000



to determine which one best predictedWikipedia growth. As Table 5 illustrates,Wikipedia growth is best explained by a qua-dratic function (R2 = 0.988, highest). In otherwords, Wikipedia article growth is most likelyquadratic and, thus, super-linear, which is anaggressive growth pattern. Quadratic growthalso best explained the increase in the numberof Wikipedians and in the number of edits(changes) made to Wikipedia.

Quality ImprovementThe second effectiveness measure, the

allocation of effort to quality improvement, sug-gested a shift toward more quality as Wikipediaarticles aged. Table 6 illustrates that during thefirst 20% of an article’s existence, each edit re-sulted in about 60 additional words vs. 11 orfewer words for the remaining 80% of thearticle’s life (up to the measurement date inMarch 2004).

The differences in the means of these ra-tios were highly significant, confirming thatlater effort is an investment in article qualityrather than article length.

DiscussionResults of the exploratory study confirm

what has been expected. Hypotheses H1, H2,H3, and H4 were all confirmed. Knowledge ac-quisition efforts apparently can successfullyadopt Bazaar-style development with multi-userinvolvement, incremental changes, and quick

releases in an environment that enables con-versational knowledge acquisition. In the caseof Wikipedia, this was possible for several rea-sons. First, Wikipedia was able to draw a largeand quadratically growing developer group (ap-proximately 7,000 as of May 2004).

Second, Wikipedia pages are highlydecoupled from each other so that new authorscan write with little concern for the current con-tent of other pages. When an author breaks ahyperlink or negatively affects content, it be-comes quickly apparent, and other Wikipedianswill fix the problem. Third, when authors make acontribution, whether writing a new page orchanging an existing article, the result is imme-diately visible to the entire community, thusenabling quick releases with minimal latencyand multi-user quality assurance. Therefore, thetransaction cost of making a contribution is low,much lower than in any peer-reviewed or closedsource authoring environment (Ciffolilli, 2003).Fourth, there is no individual ownership ofWikipedia pages, which are developed by vol-unteers; thus, everyone works to improveeveryone’s contributions. Quality is everyone’sresponsibility. Fifth, Wikipedia has strong edit-ing guidelines that are motivated by therefactoring rules of software development andprinciples of objectivity. This ensures that ar-ticles, which might have suffered in readabilityfrom the disjointed work of multiple contribu-tors and commentator, ultimately become veryreadable again.

Table 5. Growth in Wikipedia articles (articles official count, March 2004)

Relationship R2 p Linear 0.932 0.0000 Exponential (log) 0.819 0.0000 Quadratic (square root) 0.988 0.0000

Table 6. Words per edit by article age

20% - Avg. Words / Edit

80% - Avg. Words / Edit

t (df = 39) Significance p

2001 Articles 57.8 6.7 5.848 0.0000 2002 Articles 64.4 10.7 4.239 0.0001



As a result, in three and a half years ofexistence, Wikipedia has challenged the other-wise largest but closed authorship Encyclope-dia Britannica (Britannica Online) for leader-ship in content (Britannica has self-reportedlyabout 100,000 entries, although with a largerword count per article). Other wiki-supportedknowledge assets, such as Wikitravel, for in-stance, may be able to achieve similar leader-ship roles in their knowledge domain. The open,multi-user model also appears to scale well byinteresting an increasingly larger user popula-tion to contribute their efforts, thus keepingthe article latency at about 10 days for old ar-ticles (initially created in 2001) and less thantwo days for newer articles (i.e., 2003). How-ever, since wiki technology is relatively newand contrary to many organizations’ cultures,we should not expect this approach to becomepredominant soon. In fact, the successes arefew at present. However, one should expect anincreasing number of wiki software products toemerge in the future and an increasing numberof communities to replace their inferior conver-sational technologies with wikis.

CONCLUSIONThe challenge of capturing and maintain-

ing exponentially growing volumes of knowl-edge requires new ways of knowledge acquisi-tion; namely, on approaches that rely on thecontributions of many rather than the expertiseof a few. Wiki technology and the wiki way ofcollaboration show a feasible model for knowl-edge acquisition and maintenance. Wikipediaoffers an illustration of the effectiveness of thisapproach. The research demonstrates that us-ers of a wiki-based knowledge asset (i.e.,Wikipedia) apply Bazaar-style methods andtechniques in their conversational knowledgeasset creation. The research also suggests thatknowledge acquisition through collaborationand conversation can lead to super-linearknowledge asset growth and continuous qual-ity improvement.

Not surprisingly, there are several cave-ats. For instance, knowledge quality cannot bemeasured or managed easily. The quality of

Wikipedia articles, for instance, remains asource of arguments. Therefore, future researchwill need to investigate the quality of the re-sulting knowledge based on content. In addi-tion, knowledge creation with wikis relies on astrong and positive social contract among itscontributors and on subject matters that arenot controversial. These conditions are not al-ways present. Wikipedia does have guidelinesin place to handle disorderly participants andto maintain a neutral point of view (NPOV) inarticles. But Wikipedia clearly relies on the so-cial capital within its community. Studies of lessstrong communities will have to be part of thefuture research in order to determine knowl-edge losses due to lack of social capital. Fur-thermore, Bazaar-style knowledge managementrelies on volunteers who are genuinely inter-ested in the cause. This may not be a paradigmfor organizations where knowledge assets arenot free. Future research will need to explorethe applicability of open source knowledgemanagement when the intellectual property isat least partially proprietary. Finally, the dis-cussed approach to knowledge managementappears to work, partly because it can engageincreasing numbers of participants to deal witha growing task domain. One has to wonderabout the limits of growth of this scenario. Con-sidering both the positive findings and the chal-lenging questions, it appears that Bazaar-styleknowledge acquisition using wikis will be apromising application for the practice of knowl-edge management as well as a rich source ofinteresting research questions.

ACKNOWLEDGMENTSThe research described in this article was

supported in part by CityU grantSMA#9030992. The author completed a sub-stantial portion of the research leading to thisarticle during two sabbaticals at the School ofInformation Science, Claremont Graduate Uni-versity. The research assistance of KarenCheung is thankfully acknowledged.



REFERENCESBenkler, Y. (2002). Coase’s Penguin, or, Linux

and the nature of the firm. The Yale LawJournal, 112(3), 369-446.

Brown, E. (2004). Veni, vidi ...Wiki? Forbes,173(13), 062a.

Buchanan, B., & Smith, R. (1988). Fundamen-tals of expert systems. Annual Review ofComputer Science, 3(7), 23-58.

Bygstad, B. (2003). The implementation puzzleof CRM systems in knowledge-based orga-nizations. Information Resources Manage-ment Journal, 16(4), 33-45.

Cheung, K. S. K., Lee, F. S. L., Ip, R. K. F., &Wagner, C. (2005). The development of suc-cessful on-line communities. InternationalJournal of the Computer, the Internet andManagement, 13(1).

Ciffolilli, A. (2003). Phantom authority, self-se-lective recruitment and retention of membersin virtual communities: The case ofWikipedia. First Monday, 8(12). Retrievedfrom http://firstmonday.org/issues/is-sue8_12/ciffolilli/index.html

Davenport, T., & Prusak, L. (1998). Workingknowledge. How Organizations ManageWhat They Know. Harvard Business SchoolPress.

Drucker, P. (1993). Post-capitalist society. NewYork: Harper Business.

Drucker, P. (1999). Management challenges forthe 21st century. New York: Harper Busi-ness.

Frappaolo, C., & Wilson, L. T. (2003). After thegold rush: Harvesting corporate knowledgeresources. intelligentKM. Retrieved fromhttp://www.intelligentkm.com/feature/feat1.shtml

Hamm, S. (2004, August 18). Linus Torvalds’benevolent dictatorship. Business WeekOnline. Retrieved from http://businessweek.com/technology/content/aug2004/tc20040818_1593.htm

Hof, R. D. (2004, June 7). Something wiki thisway comes: They’re Web sites anyone canedit — And they could transform corporateAmerica. Business Week, 128.

Holsapple, C. W., & Joshi, K. D. (2002). Knowl-edge management: A threefold framework.The Information Society, 18, 47-64.

Jones, C. (1996). Patterns of software systemsfailure and success. Boston, MA: Interna-tional Thomson Computer Press.

Land, R. (2002). Software deterioration andmaintainability: A model proposal. In Pro-ceedings of the Second Conference on Soft-ware Engineering Research and Practicein Sweden (SERPS), Karlskrona, Sweden.

Lee, W. A. (2001). Experian and fair, isaac tweaksubprime scoring. American Banker,167(247), 7.

Leuf, B., & Cunningham, W. (2001). The wikiway: Collaboration and sharing on theInternet. Reading, MA: Addison-Wesley.

Malhotra, Y. (2000). From information manage-ment to knowledge management: Beyond the“hi-tech hidebound” systems. In K.Srikantaiah, & M. E. D. Koenig (Eds.), Knowl-edge management for the information pro-fessional (pp. 37-61). Medford, NJ: Informa-tion Today Inc.

Markus, M. L., Manville, B., & Agres, C. (2000).What makes a virtual organization work —Lessons from the open source world. SloanManagement Review, 42(1), 13-26.

Mayfield, R. (2002, November 19). Occupationalspam. Ross Mayfield’s Weblog: Markets,Technology and Musings. Retrieved fromhttp://radio.weblogs.com/0114726/2002/11/19.html

Mockus, A., Fielding, R. T., & Herbsleb, J. D.(2002). Two case studies of open sourcesoftware development: Apache and mozilla.ACM Transactions on Software Engineer-ing Methodology, 11(8), 309-346.

Newcombe, T. (2000). Opening the knowledgeportal. Government Technology. Retrievedfrom http://www.govtech.net/magazine/gt/2000/sept/knowledge/portal.phtml

Nonaka, I. (1994). A dynamic theory of organi-zational knowledge creation. OrganizationScience, 5(1), 14-37.

Raymond, E. S. (2001). The cathedral and thebazaar. First Monday, 3(3). Retrieved fromhttp://www.firstmonday.dk/issues/issue3_3/



raymond/Scacchi (2004). Understanding free/open

source software evolution: Applying, break-ing and rethinking the laws of software evo-lution. In N. H. Madhavji, M. M. Lehman, J.F. Ramil, & D. Perry. (Eds.), Software evolu-tion. New York: John Wiley and Sons.

Snyder, C. A., & Wilson, L. T. (1998). The pro-cess of knowledge harvesting: A key toknowledge management. In Proceedings ofthe 8th Annual BIT Conference, Manches-ter, UK.

Sviokla, J. J. (1990). An examination of the im-pact of expert systems on the firm: The caseof XCON. MIS Quarterly, 14(2), 127-140.

Turban, E., & Aronson, J. (2000). Decision sup-port and intelligent systems (6th ed.).Prentice Hall.

Ullman, H. (1989, July 17). Machine dreams:Future shock for fun and profit. The NewRepublic.

Wagner, C. (2000). End-users as expert systemdevelopers. Journal of End User Comput-ing, 12(3), 3-13.

Wagner, C. (2003). Knowledge managementthrough end user developed expert systems:Potential and limitations. In M. A. Mahmood(Ed.), Advanced topics in end user comput-ing, Volume 2 (pp. 148-172). Hershey, PA:Idea Group Publishing.

Waterman, D. A. (1986). A guide to expert sys-tems. Reading, PA: Addison-Wesley.

ENDNOTES1 ezboard.com announced that it had hosted

more than 1 million communities on March1, 2002, and claims 14 million registered us-ers as of June 2004.

Christian Wagner is an associate professor of information systems at the City University ofHong Kong. He has carried out research on knowledge acquisition and knowledge managementsince 1986 and has published several articles on the topic. Wagner’s recent research activitiesare documented at http://wagnernet.com. He believes that embracing people’s dual role asboth knowledge providers and knowledge “consumers”, paired with technology that enablesknowledge contribution and aggregation at very low effort, leads to a paradigm shift inknowledge management. Wagner obtained his PhD at the University of British Columbia, andpreviously served on the faculty at the University of Southern California.

Documents

Breaking the Knowledge Acquisition Bottleneck Through ... · Bazaar style development, how-ever, occurs over the Internet in constant pub-lic view. Raymond identified principles of