Common Infrastructure for Knowledge and Information Management

1. VNU Journal of Science, Natural Sciences and Technology xx (2008) 0-0 Common Infrastructure for Knowledge and Information Management Vilas Wuwongse1*, Thiti Vacharasintopchai2, Neelawat Intaraksa3 1Professor, School of Engineering and Technology, Asian Institute of Technology,P.O. Box 4 Klong Luang, Pathumthani 12120 Thailand 2 Lecturer, School of Technology, Shinawatra University,99 Moo 10 Bangtoey, Samkhok, Pathumthani 12160 Thailand3Research Associate, The Greater Mekong Subregion Academic and Research Network,Asian Institute of Technology, P.O. Box 4 Klong Luang, Pathumthani 12120 Thailand Received ...Abstract. Information and Communication Technology has advanced at an unprecedented ratenowadays, resulting in a mass of electronic contents being produced and disseminated at anexponential rate. In general, such contents are not systematically organized, making theminaccessible when they are most needed. A software infrastructure to commonly support themanagement of knowledge and information on the Internet and an intranet is proposed. It can beused to capture, preserve and manage the information and knowledge so that they stay intact anddo not vanish with time. It allows pieces of knowledge in repositories to be located and sharedeffectively across boundaries. Contents from repositories can be readily utilized and published.Opinions and discussions about contents can also be captured and archived for later reference.Once adopted and deployed in large-scale, the Common Infrastructure for Knowledge andInformation Management will play a crucial role in creating a universal source of knowledge forhumanity.Keywords: knowledge management, information retrieval, software interoperability, digital library1. Introduction computing and communication devices; yet aconstant decline in their price tags. PersonalInformation andCommunicationcomputers and accesses to the Internet haveTechnology (ICT) has advanced at an penetrated into most households, schools andunprecedented rate nowadays. We haveofficesfrom within capital cities to remotewitnessed constant growth in performance of rural towns. Computers and mobile phonescome equipped with video and audio recording_______ Corresponding author. Tel.: +66 2524-5700capabilities, like those found in ubiquitous E-mail: [email protected] digital cameras and camcorders, making the1

2. 2Wuwongse et al. / VNU Journal of Science, Natural Sciences and Technology xx (2008) 0-0creation of electronic multimedia contents more unreliable because original creators cannot beaffordable and more convenient than ever. Thisidentified and trusted. The lack of standardease of access to ICT tools results in a mass ofinteroperability protocol makes cross-systemelectronic contents being produced andsearches without a global search engine likedisseminated at an exponential rate. TheGoogle infeasible, resulting in users beingcontents are typically stored on personal hardunaware of critical knowledge pieces thatdrives and shared on the Internet as e-mails, already exist at the time they most needed. Thestatic web pages or as user-contributed contentsmetadata problem can be alleviated byusing Web 2.0 technologiesforums, weblogs, information service providers adoptingwikis, content management systems (CMS) and international metadata standards and cataloginglearning management systems (LMS). In their contents accordingly. Technologies andgeneral, they are not systematically organizedtools on ontology and natural languageor if so are merely by folder hierarchies.processing are available to help so that contentsInformation are located and discovered throughare more properly clustered and cataloged asoperating system search features, website well as queries being more aligned and matchedsearch sections, or Internet search engines likewith indexes. The preservation problem can beGoogle. There are at times when users cannotalleviated by organizations adopting digitalfind information most needed, or be presented library technologies, which enable effectivewith piles of duplicated, irrelevant informationarchiving and preserving of digital contents sowhich demand manual examinationthat they stay intact with time. The systemsometimes only to find that they are damagedinteroperability problem can be alleviated byand can partially be recovered. It is not information service providers adopting openuncommon that pieces of knowledge are not standards which allow information inshared within communities or organizationsrepositories to be exchanged freely amongbecause of the two extremespeople do not participating organizations by means ofknow that they exist or there are just too many metadata. Such standards include theof them.Search/Retrieval via URL (SRU) protocol [1]sanctioned by the United States Library ofCongress for peer-to-peer queries of content2. How Problems Can Be Alleviated metadata in repositories. The Open ArchivesInitiative Protocol for Metadata Harvesting [2] The problems described earlier can beand the RSS web syndication [3] can be used tocategorized into three groups, namely, theharvest and build up directories of metadataproblem on metadata control and indexing, the from information sources. The adoptions ofproblem on digital content preservation, and thethese open standards will widen the range ofproblem on system interoperability. The lack of information sources accessible to users from themetadata control and indexing in mainstream consumer point of view, and increase theinformation systems makes users unable to effectiveness of new contents being delivered tolocate and retrieve particular informationpotential target groups from the provider pointprecisely in time, or be presented with piles ofof view.irrelevant information. The lack of contentpreservation makes some retrieved contentsinaccessible because of file damages or 3. Tn tc gi / T p ch Khoa h c HQGHN, Khoa h c T Nhin v Cng ngh t p (n m) s trang 1 Fig 1. System Architecture of Common Infrastructure forKnowledge and Information Management. and opinions as digital contents; the Data3. A Common Infrastructure for Knowledge Center layer that systematically collects,and Information Management catalogs, and preserves the digital contents so that they stay intact and are ready for randomA system architecture which unifies theretrieval on demand; and the Data Utilizationindividual solutions described previously into layer that enables intelligent and effectivean infrastructure for the management ofsearches for pieces of knowledge in data centersinformation and knowledge on the Internet andsuch that they can be quickly posted onan intranet is proposed in Figure 1. It is websites. Content management systems andconsisted of four layers, namely, the Networklearning management systems allow generaland Internet layer, the System Software layer, users without programming skills to publishthe Data Exchange and Metadata layer, and thecontents online conveniently. Querying to dataApplication layer. centers loaded with useful pieces of knowledgeThe Network and Internet layer is the basicis analogous to inquiring some human-filteredinfrastructure for information exchanges search engines.between information services. Such anThe Data Exchange and Metadata layerinfrastructure is provided by Internet service comprises the standards and software servicesproviders and is commonly taken for granted in that facilitate the exchange of knowledge piecessoftware development.between information systems. These include theThe System Software layer is responsible SRU, the OAI-PMH and the RSS protocolsfor the creation, storage and utilization of introduced earlier as well as the systemknowledge. It is composed of three sub-layerssoftware components that handle informationwhich are the Knowledge Creation and exchanges based on such protocols.Capturing layer that records personal The Application layer involves the humanlyknowledge, experiences as well as informationprocesses that utilize individuals knowledge 4. 2 Wuwongse et al. / VNU Journal of Science, Natural Sciences and Technology xx (2008) 0-0and experiences captured, shared and application, under the title The Knowledge,discovered across information systems to Imagination, Discovery and Sharing (KIDS-D)perform tasks in various disciplines, whichProject, in which networks of digital librariescould range from education, preservation ofare created to archive and exchange usefulcultural heritages, planning and development tolearning materials among teachers and studentsagricultural and environmental activities. at pilot schools and institutes across Thailand. Rare historical books from the National Archive have also been digitized and preserved4. Prototype Systemin the digital libraries. Such contents as well as discussions with fellow students and teachersThe components for the proposedon them are hoped to alleviate the academicinfrastructure are being developed at the Asianresource deficiency problem in Thailand. ItInstitute of Technology based on open-source should be noted that, unlike other softwaresoftware tools. Core components in the Datainfrastructures in which components are tightlyCenter, Data Utilization as well as Data coupled and deployed, our implementation ofExchange and Metadata layers have been the Common Infrastructure are loosely coupled,implemented. Components for the Knowledgemeaning that software components interoperateCreation and Capturing layer are being through open standards and protocols and candesigned. The DSpace [4] and Greenstone [5]be readily replaced by alternative componentsdigital library servers are chosen as the enginesthat are standard-compliant. Therefore, thefor the collection and preservation of digital implementation of the Common Infrastructurecontents into knowledge repositories. Thefor Knowledge and Information Management isMoodle [6] learning management system andalso highly flexible and scalable in this regard.the Drupal [7] content management system arechosen as the platforms to utilize knowledge inrepositories for academic and non-academic 5. Conclusionpurposes, respectively. A metadata harvesterhas been developed to aggregate metadata fromThis paper has presented a softwarevarious information sources into a central infrastructure to commonly support thedirectory. The metadata harvested include thosemanagement of knowledge and information onfrom digital libraries and library managementthe Internet and an intranet. The infrastructuresystems, through the OAI-PMH protocol as can be used to capture, preserve and manage thewell as those from Web 2.0 sites, through RSSinformation and knowledge that belong toweb syndications. A single search engine has communities and organizations so that they staybeen developed to assist users in exploring such intact and do not vanish with time. It allowsdirectory. Contents can be retrieved bypieces of knowledge in repositories to bekeywords in metadata fields and can be located and shared effectively acrossbrowsed and explored by facets of metadata.boundaries within an organization or betweenThe SRU support for peer-to-peer metadataorganizations and nations through openqueries has been added to the original DSpacestandards, without necessitating for acode and is being incorporated into thehomogeneous software suite. Contents fromsingle-search facility. An offspring of this repositories can be quickly utilized andimplementation was applied in an e-learningpublished with the assistance of learning 5. Tn tc gi / T p ch Khoa h c HQGHN, Khoa h c T Nhin v Cng ngh t p (n m) s trang3management systems and content managementReferencessystems. Opinions and discussions aboutcontents can also be captured and archived back[1] The Library of Congress, Search/Retrieval viainto the repositories for later reference. OnceURL, June 2008. Available online: http://www.loc.gov/standards/sruadopted and deployed in large-scale, the [2] The Open Archives Initiative, Open ArchivesCommon Infrastructure for Knowledge andInitiative - Protocol for Metadata Harvesting,Information Management will play a crucial October2004.Available online:role in creating a universal source of knowledge http://www.openarchives.org/OAI/openarchivesfor humanity.protocol.html [3] RSS Advisory Board, RSS 2.0 Specification, 2006.Availableonline:Acknowledgements http://www.rssboard.org/rss-specification [4] DSpace, DSpace An Open-source Solution for Accessing, Managing and PreservingThe authors would like to thank the Royal Scholarly Works, 2008. Available online:Thai Government for their financial supporthttp://www.dspace.orgthrough the Greater Mekong Subregion [5] Greenstone, Greenstone Digital LibraryAcademic and Research Network KnowledgeSoftware,2008.Available online:Management Toolkit and Applications project. http://www.greenstone.orgThey would also like to thank countless friends[6] Moodle, Moodle A Free Open Sourceand colleagues whose constructive comments Course Management System for Onlinehave contributed to this research. The Learning,2008.Available online: http://moodle.orgdevelopers of the free and open source software [7] Drupal, Drupal An Open Source Contenttools used are thankfully acknowledged for Management System Platform, 2008. Availabletheir devoted time and contributions.online: http://drupal.org