1 Information retrieval systems in scientific and technological libraries: from monolith to puzzle and beyond Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a presentation at the annual conference organised by IATUL http://www.iatul.org/ the International Association of Technological University Libraries in Porto, Portugal, May 2006http://www.iatul.org/
2 These slides should be available from the WWW site http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/ note: BIBLIO and not biblio
3 Abstract of this presentation This contribution presents an overview of the evolution of the retrieval systems implemented in scientific and technological libraries to bring user to relevant information sources. We observe a growth in complexity, 1. starting from classical hard-copy catalogues over the monolith online public access catalogue 2. to a puzzle of software tools that try to cope with the growing complexity of the information sources and services offered by libraries 3. while the evolution is going on and pieces of the puzzle are still missing; so software developers and librarians may pay attention to these software tools for their future activities. More concretely we consider software systems to improve the queries made by users, by expansion or refinement; to cope with ambiguity of queries by categorizing search results in topical clusters; to visualize data sets (information) in a map on the users computer display to assist the user in analysing, interpreting, understanding, and eventually in decision making; such visualization tools can be applied to show and reveal for instance the characteristics of the collection(s) of data/ information sources that are made available to the user, the relations among words, terms, classification codes and so on, in the process of formulating and improving queries, some characteristics of the set of documents that results from a search query by a user. In conclusion: significant progress is still possible in the area of information retrieval tools offered by libraries.
4 Information retrieval systems in libraries: 1. from monolith 2. to puzzle 3. and beyond: 3.1 Systems to expand or refine a search 3.2 Systems to cluster documents 3.3 Systems to visualize sets of data contents = summary = structure = overview of this presentation
0. Information retrieval in libraries is (still) evolving
6 Information retrieval in libraries is (still) evolving This contribution presents an overview of the evolution of the retrieval systems implemented in libraries to bring users to relevant information sources. Complexity is increasing. Moreover several additional computer-based tools are proposed that may add some value.
1. The past: Information retrieval through the monolith catalogue
8 Classical hard-copy catalogues and more recent computer-based catalogues have fulfilled a central role in most libraries. They can be seen as monoliths: solid, simple, straightforward systems.
2. The present: Information retrieval as a puzzle
10 2.1 Many target databases in the puzzle Many libraries today offer access to 1.hard-copy collections 2.digital information collections Therefore they are called hybrid libraries. Examples of target databases: online access catalogues of local print collections local online access digital document repository external bibliographic databases external full-text databases and repositories search engines to find external WWW pages
11 2.2 Many library retrieval tools in the puzzle To offer the contents to users, many computer-based tools can be installed by the library, such as a central library catalogue, using database technology a WWW site of the library, which offers links to sources a system to search through a local document repository a system for federated searching through several databases a system that generates links from an available starting point to related information sources and services (based on OpenURL) (search engines that cover selected WWW pages)
12 2.3 Need to educate and guide users in information retrieval The complexity of the information landscape, in particular of the sources + retrieval tools and services offered by many libraries justify the metaphor puzzle or jigsaw puzzle. Some user guidance is justified so that all the libraries offering can be exploited well and efficiently by well- informed users.
13 2.4 Assembling the pieces of the information retrieval puzzle To reduce the complexity in the eyes of users, it is important that the many retrieval system components in the library are integrated as far as possible. Furthermore, user education and guidance should be well integrated in this information retrieval system. In this line of thinking, it helps when an OpenURL-based generator of links is incorporated in the retrieval system of a library.
3. The future? Missing pieces of the information retrieval puzzle 3.0 Introduction
15 3.0 Introduction The evolution of the retrieval tools or system offered by libraries is going on and we suggest that some pieces of the puzzle are still missing. Software developers and librarians may include these in their planning.
16 3.0 Introduction: Basic difficulties in information retrieval Information retrieval from databases is hindered by several difficulties. These are well-known by information experts and scientists, but not by all users. The following are some fundamental problems.
17 3.0 Introduction: Basic difficulties in information retrieval (continued) Difficulty: A word or phrase is not the same as a concept. This may cause a low recall. Word Concept
18 3.0 Introduction: Basic difficulties in information retrieval (continued) When the user needs information related to a particular concept or a combination of more elementary concepts, then the user should formulate a query that covers these concepts well, by using not just a single word or term to cover each concept, but by using several words and/or terms, including synonyms, spelling variations, narrower terms, related terms, translations, and so on. The aim is mainly to increase the recall of the search action, by covering the concept better, but also to increase the precision by including the most appropriate words and/or terms in the query.
19 3.0 Introduction: Basic difficulties in information retrieval (continued) Difficulty: Many words suffer from ambiguity of meaning. This may cause low precision. Word Relevant concept Irrelevant concept NOT wanted
20 3.0 Introduction: Basic difficulties in information retrieval (continued) Many words and/or terms from some natural language suffer from ambiguity, because natural languages have evolved spontaneously, not strictly controlled. An example is the word pascal, which can have several meanings: the philosopher named Blaise Pascal, the programming language named Pascal, the physical unit of pressure, and the name of many persons
21 3.0 Introduction: Basic difficulties in information retrieval (continued) When ambiguous words or terms are inserted by a user in a database query, then this generates noise, irrelevant entries in the query result set. In other words, this lowers the precision of a search, where precision can be defined more formally. This difficulty can be tackled already in the stage of database production, in the stage of formulating a query, and also in the stage when the computer system presents the results of a query, for instance by clustering the results in topical categories.
3. The future? Missing pieces of the information retrieval puzzle 3.1 System to expand or to limit a first query by a user
23 3.1.1 Classification and thesaurus systems To cope with the difficulties mentioned above, classification and thesaurus systems have been used already for centuries. In reality nowadays, many information collections have become so large that application of a classification or thesaurus system by the database producer has become too expensive. Any of the well-know, popular, big WWW search engines can serve here as an example.
24 3.1.1 Classification and thesaurus systems Furthermore we prefer ideally a system that is applicable to any target database or even to several targets at the same time, like in federated searching through several databases in one search action. Therefore, a comprehensive, horizontal, general thesaurus system for some relevant human natural languages would be welcome. Ideally this would be integrated well with the user interface offered to formulate a search query. Application of the thesaurus helps the user to expand or refine an initial query, manually, after consideration of several possibilities.
25 3.1.1 Horizontal thesaurus systems for natural human language For instance, WordNet offers an open access thesaurus for the English language. A WWW site is devoted to the system: http://wordnet.princeton.edu/ http://wordnet.prin