Building and organizing internet collections

Pergamon Library Acquisitions: Practice &Theory, Vol. 19, No. 2, pp. 243-249, 1995

Copyright © 1995 Elsevier Science Ltd Printed in the USA. All dghts reserved

0364-6408/95 $9,50 + .00

0364-6408(95)00013-5

CHARLESTON CONFERENCE 1994

BUILDING AND ORGANIZING INTERNET COLLECTIONS

WILLIAM A. BRITTEN

Automation Librarian University of Tennessee

647 Hodges Library Knoxville, TN 37996-1000

Internet: britten @utklib.lib.utk.edu http://gopher.lib.utk.edu/bill.html

Abstract - - This paper discusses the evolution of Internet information servers in libraries, from their typical origin as maverick systems department experiments, to the current state where libraries must address the need to incorporate network-based information into the traditional work of collection management, acquisitions, and cataloging. Does the traditional model of information acquisition and organization apply to network-based information? In a world where nearly anyone can be both a provider and a consumer of information from the comfort of their personal computer, what role do libraries have ? How should the library profession respond to the proliferation of gopher and world wide web servers ?

Keywords - - Electronic, Internet, Collection Management, Acquisitions.

INTRODUCTION

Just a short time ago Internet information servers in libraries, on platforms such as gopher and world wide web, were uncommon. Typically, they were experimental services launched by Library Systems departments, many times without official library sanction. However, network- based information has proliferated, and sophisticated information retrieval clients such as Mosaic have become widespread on the desktop computers of librarians. In addition, most academic libraries have already provided public access to these tools or are planning to. The rush to provide

243

244 w.A. BRI'FrEN

access to the Internet has resulted in a corresponding surge in the creation of gopher and web servers among libraries. The subject of this paper is the nature of Internet collections, how they are built and organized today, and what changes may arrive in the near future.

THE PROLIFERATION OF NETWORK INFORMATION

Academic libraries find themselves in a landscape of accelerating change. Interact information is growing at a ferocious rate. Not since the race to the moon has there been a national frenzy of inter- est as that which surrounds the "information superhighway." Full Interact connection (e.g., advanced clients like Mosaic) to the home, K-12, and university dorm rooms is imminent. Commercial inter- ests are pouring money into Internet information services, including organization and retrieval schemes. Yet, even as the quality of network information has improved, access and organization remain fairly primitive. Nearly anyone can be both a consumer and provider of information, from the privacy and comfort of the desktop computer. Library managers find it difficult to know exactly what the library's role is within this climate of information as global network commodity.

For example, the following are network information services currently available via world wide web:

• Le WebLouvre URL http://mistral.enst.fr/-pioch/louvre/ An impressive online art museum specializing in Impressionist works.

• British Poetry 1780-1910 URL http://www.lib.virginia.edu/etext/britpo/britpo.html An archive of scholarly editions from Alderman Library, University of Virginia

• Views of the Solar System URL http:/Iwww.c3.1anl.govl~cjhamillSolarSystemlhomepage.html An extensive compilation of NASA photos into an educational tour

• Stat-USA URL http://www.stat-usa.gov/ Includes the National Trade Data Bank, Economic Bulletin Board, and other government sta- tistical sources

• 1990 Census Lookup URL http:llwww.census.govl Another valuable government data source

• Novell Online Services URL http://www.novell.com/ An excellent example of commercial use of the Internet

• Encyclopedia Britannica Online URL http://www.eb.com/eb.htm You can take a look, but you cannot search without subscribing!

• Internet Underground Music Archive URL http://sunsite.unc.edu/ianc/index.html The wonderful playfulness and anarchy of the web in action!

• Bill's Lighthouse Getaway! URL http://gopher.lib.utk.edu/lights.html Proof that anyone can be an Internet author and publisher

ORGANIZATION AND RETRIEVAL OF NETWORK INFORMATION

This is just a very small sampling of the scholarly, government, commercial, and other information available on the Internet. Academic libraries have started to organize and provide access to this onslaught of network resources, first with gopher servers, and now with world-wide web

Building and Organizing Internet Collections 245

servers. Typically, these servers are a browsing environment, with hierarchical organization schemes. For example, at the University of Tennessee, Knoxville, the web and gopher servers (http://www.lib.utk.edu and gopher://gopher.lib.utk.edu) offer a metaphor of traditional library divisions (Figure 1), including "electronic books," "electronic reference," and "information by subject." When a user browses down into the "information by subject" area, the next menu is a concise listing of the LC class letters A-Z with some associated subject terms (Figure 2).

For example, P: Language, Literature, Film, Journalism, Theater. Browsing down into the "P's" would then bring up a listing of hypertext links to some actual resources (e.g., the British Poetry resource listed earlier). Other libraries have organized Internet resources in other ways. North Carolina State University (http://www.lib.ncsu.edu/disciplines/) has organized by the traditional subject disciplines of Humanities, Sciences, and Social Sciences. Lower levels of the subject hierarchy reveal more specific subject divisions. There are now many libraries building this type of organized environment for network information, in the same way that libraries have always provided organized environments for traditional information. Even the Library of Congress (http://lcweb.loc.gov/) has a web server.

Welcome to UTK Libraries

Hodges Library

• Welcome to UTK Librariesl • UTK Libraries Catalog and other Services • Other Library Catalogs & Info Systems • Electronic Books • Electronic Journals • Electronic Reference • Information by Subject • Other Intemet Resources • Other UTK Information • What's New in OLIS • Search the Gopher menus at UT - Knoxville • World Wide Web Search Engines

If you'd prefer to get away from the hard-core academic information, pause and peruse the Back Page,

About this server

Figure 1.

246 W.A. BRI'ITEN

Information by Subject

• A: General Information • B: Philosophy, Psycholoav, Religion • C-F: History • G: Geo.qraDhv. Oceano.qraphy, Maps, Anthropolotly,

Recreation • H: Economics, Sociolo¢ly • J: Political Science • K: Law • L: Education • M: Music • N: Fine Arts • P: Lan.qua(le. Literature, Film, Journalism, Theater • Q: Science • R: Medicine • S: AaricuIture, Animal Medicine & Culture • T: Technoloav. Enqineerina. Space, PhotoaraDhy • Z: Library Science • Interdisciplinary Subject Collections at Other Sites

1 ~ Home

Figure 2.

In addition to libraries, many computing centers, corporations, and individuals are also attempt- ing to organize the Intemet. Some of these efforts have vast collections of links, often using organization schemes that might make a librarian cringe. One of the most impressive of these efforts is the Yahoo Guide to WWW (http://akebono.stanford.edudyahoo/). Constructed by two self-pro- claimed "yahoos," this server has organized close to 30,000 Internet links (as of March 1995) in an informal subject hierarchy, using common subject terms probably not drawn from any authority list. The system includes a form for anonymous users to contribute new links. The "yahoos" are David Filo and Jerry Yang, two graduate students at Stanford University. This server is a wonderful example of the entrepreneurial spirit and amazing nature of the Intemet that allows individuals to accomplish what might have required institutions in a traditional environment. However, one of the downsides to this entrepreneurial spirit is a lack of permanence. In fact, when you read this, Yahoo may no longer exist. It can be argued that the institutional involvement of libraries in the organization process would result in more dependability and continuity. Conversely, it can also be argued that impermanence and change is the very essence of the Intemet.

One further step evolved from the many independent attempts to organize the Internet is the concept of the Virtual Library (http:llinfo.cern.chlhypertextlDataSourceslbySubjectlOver-


view.html). The current W W W Virtual Library is an al l-volunteer effort administered in Switzerland at the site where the world-wide web originated. Volunteers at sites around the world agree to maintain a web server in a subject area. For example, biological sciences is maintained at Harvard, chemistry is maintained at UCLA, etc. Each site develops a web server that links to other web servers in their subject area. The distributed subject servers are then pulled together into a "virtual library" at the main server in Switzerland. Conceptually, this is a wonderful project. It addresses the inefficiency of many libraries replicating Internet collections when the network environment allows one collection to serve all. However, in its reliance on individuals as maintainers, the WWW Virtual Library seems vulnerable to disorder as these maintainers change jobs or become busy with other projects.

Is there a niche here for libraries? I think so. The long-term institutional perspective of libraries, librarians' expertise in information organization, and the library profession's organizational struc- ture (ALA, CLA, PLA, etc.) would add both collection management expertise and permanence to a virtual library project. Subject bibliographers from different libraries could even pool their expertise through collaboration over the network to help build these virtual subject collections. An ALA Virtual Library Collection would be preferable to needlessly building dozens of similar collections, as is the trend now among academic libraries.

Even higher on the evolutionary tree of network information-locating tools are the "search engines," which allow searching for information in several ways. Some search through directories of gopher or web sites, while others search document titles or even the text of the documents. Many offer templates for the user to construct very sophisticated searches. (For a list of search engines connect to http://www.lib.utk.edu/search.html). These search engines are often associated with Internet "robots" or "web crawlers" that are programmed to roam the Internet looking for new information. These robots then build a searchable database. For example, the Lycos project at Carnegie-Mellon University (http:Hfuzine.mt.cs.cmu.edu/mlm/lycos-home.html) has over two million URLs (universal resources locator) in their database as of this writing. A similar project is the Harvest Information Discovery and Access System, a collaborative project of the Internet Research Task Force Research Group on Resource Discovery (IRTF-RD). Harvest (http://harvest.cs.colorado.edu/) is an integrated set of tools to gather, extract, organize, search, and replicate information across the Internet.

CHARACTERISTICS OF NETWORK INFORMATION

There are differences between traditional paper-based resources and the new network information which have an impact on collection, organization, and use. First, most network information is still free. That may change rapidly as the network is commercially exploited, but the Internet's history as a free academic environment prevails. One notable exception to this is the Encyclopedia Britannica Online. Because most of the resources are free, libraries can "select" and "catalog" (i.e., incorporate into a network server) multiple "copies" (actually it's one copy with multiple access routes). Furthermore, one network "copy" of a resource can serve many libraries. That is, the network environment allows a resource to reside at one location while being accessed from many locations, eliminating the need for redundant storage. Unfortunately, there is a down side to this characteristic: while most traditional resources tend to stay on the library's shelves until they circulate, network information has a maddening habit of moving around. If the owner of the information moves it to another computer, or even a different directory, all of the network links to that information will suddenly be inaccurate and inoperative. This is currently a serious problem for gopher and web server maintainers.

248 w.A. BRITIEN

Another quality of gopher and web servers is that the collection is the catalog. Traditional online catalogs serve as a surrogate for the collection, referring users to the actual items by way of the classification scheme. There is no need for this in a hypertext-based network server, since the user searches, locates, and displays the full content of the information all within the interface of a network client such as Mosaic. As users become more familiar with network clients and the available search engines, the work currently underway by some libraries to enter URLs in the online catalog will seem inappropriate.

As network resources developed, they are often interdisciplinary. This could make them prob- lematic from a traditional cataloging perspective, but because the acquisition of a network resource usually means installing a pointer to a remote computer, it is possible to create multiple links to a resource from whatever subject areas seem applicable. For example, a network link to an environmental, or "green," resource might be created from the biosciences subject area as well as the political science area.

An extension of this ability to freely link users to information is the possibility of linking to a subset of a resource - - a type of network analytics. Today's client/server environment does not mandate that users connect (login) to large information services. Instead, a very short-term trans- action takes place between a client and the server, during which a very specific piece of information passes from server to client (user). This characteristic of network information allows libraries to tailor information delivery. Using the environmental resource again as an example, a library could install a link from its electronic reference area to a Green Glossary which is part of the environmental server 's information. Another link might go from the Political Science area of the library's server to a Congressional Scorecard on the Environment. This type of linking to subsec- tions of a resource can be done for book chapters, journal articles, etc., and is the essence of the most profound difference between traditional and network information.

I think that the essential nature of Intemet information today is that the building blocks of the information package are available on the network. Discrete bits of network information (an image, a document, a recording) can be packaged into one or more resources, even if the information bits are physically remote from each other. World wide web documents are an example of this. Most web pages are a concoction of coded text plus links to graphics, sound, animation, or whatever. These pages do not exist in any traditional sense - - they are an illusion of sorts - - and can be reassembled in a different form with or without the original owner's knowledge. Two implications of this come to mind immediately: first, copyright becomes a very slippery concept in this environment; and second, some of the concepts used to define traditional packages of paper-based information (e.g., journal issues and volumes) may not be appropriate on the Internet. Network copyright is too vast an issue to be addressed here, but I would like to discuss network information packaging, specifically as it relates to journals. The concept of an electronic journal has been kicked around the network for several years, but has never crossed a threshold of viability. Initial listserv-based journals, and later gopher-based journals were limited by the mandate for ASCII (no graphics) text, which both publishers and readers found boring. Attempts at launching bit-mapped (scanned) images of journal pages solved the no-graphics problem, but suffered from huge storage requirements and difficulties in displaying and printing the pages. The current web clients and servers, using the Hypertext Markup Language (HTML), offer a more attractive scenario for electronic journals. For example, the Indiana Journal of Global Legal Studies (http://www.law.indiana.edu/glsj/glsj.html) illustrates the formatted multiple-font text, the graphics, and hyper-text "page turning" of today's web journals. We may have finally arrived at an alternate model for publishing and distributing scholarly articles. Note that I said scholarly articles rather than journals. Is the packaging concept of journal, necessary for network information? Why not just a database of articles? In his speech closing the 1994 Charleston Conference, Clifford Lynch asked the question:


"if you (the collection management librarians) could get a feed from the publisher of everything they offer, for one price, would you jump at the chance?" I might modify Clifford's question to ask: if a publisher would make available a central database of all their articles, in a standard format such as HTML, for one price, would you jump at the chance to access that database?

ASSERTIONS AND PREDICTIONS

In conclusion, I would like to offer several assertions, and risk a few predictions:

• Library Systems Departments cannot continue to build network information systems alone. The collection managers, selectors, catalogers, etc. must be involved as Interact information moves from experimental to authentic.

• The traditional model of each library building a collection is not appropriate except for local- ly unique items. The idea of the virtual network library is attractive, but the library profession needs to formalize the process of assigning responsibility for building these virtual subject collections. It might be possible for virtual collections to be built through network collaboration among subject bibliographers.

• An alternate model for publishing and distributing journals may be viable as the web-based mark-up language, HTML, matures. Who publishes these materials is up for grabs.

• Network information will get increasingly smart; it will know what it is and where it is. Standards will emerge for coding Internet information to allow network robots to perform Interact-wide indexing.

• The network will become dominated by very intelligent search engines that will replace today's browsing hierarchies.

If the network does become dominated by robot crawlers and automated indexing, it is not clear what will constitute a collection and what the library's role will be. As a systems librarian who has been among the f'trst to develop first a gopher server, and then a web server at an academic library, I have found it increasingly difficult to keep up with the pace of change. And yet, even as the Intemet loses its cult status, and is buffeted by the forces of commercialization and very serious computer science, I find it to be a most amazing time to be an information professional.

Documents

Building and organizing internet collections