Search fatigue: finding a cure for the database blues

Embed Size (px)

Citation preview

  • 8/14/2019 Search fatigue: finding a cure for the database blues

    1/5

    amercan

    brares

    |march2007

    4 6

    Finding a cure for the database bluesby Jeffrey Beall

    SE A R C HF A

    T I G U E

  • 8/14/2019 Search fatigue: finding a cure for the database blues

    2/5

    a m e r i c a n

    l i b r a r i e s

    | m a r c

    h 2 0 0 7

    Youve probably experienced search fatigue yourself: You try several searches to nd information you thinkought to be present in a database, but no matter howmany different ways you enter your search, you fail tond what youre looking for.

    As online databases grow in size and as the simplesearch box, such as the one popularized by Google,becomes the norm, search fatigue will become an in-creasing problem. Fortunately, librarians are well-po-sitioned to help database searchers overcome search

    fatigue by designing and implementing databases

    and search systems that rely on value-added featuresthat provide searchers satisfying and comprehensivesearch results.

    The chief cause of search fatig ue is a reliance on key- word searching. A number of inherent aws plague key- word, or full-text, searching. One major aw in keywordsearching is that of synonyms. For example, a searcherlooking for information on false teeth will probablymiss al l the resources that use the termdentures.

    A more extreme example of synonyms is the term

    Atlantic cod. There are at least 60 different terms for

    It is a feeling of frustration and dissatisfaction experienced

    by searchers trying unsuccessfully to nd desired informa-

    tion in a database. It result s when searchers cannot nd

    what they are looking for and when they repeatedly get

    result s that do not match their information needs. This is

    search fatigue.

  • 8/14/2019 Search fatigue: finding a cure for the database blues

    3/5

    amercan

    brares

    |march2007

    4 8

    this species of sh, including codling , Newfoundland sh, schrod, shoal sh, and winter shall different names forthe same sh. But very few resources likely use all 60terms; in fact, most probably use just a single term. Soany keyword search on a single name will likely miss all

    the other resources that use different names. Users whorefer to Atlantic cod by one of its less-common names will probably nd little information on the topic. Inthis way, keyword searching shortchanges the minority who use the less common term and favors those who usemore common terms for a given topic; it also yields in-complete search results.

    Another major weakness of keyword searching isits inability to deal well with homonyms. One exampleis leaks. There are at least two major meanings of the word: One refers to an unintentional hole that allowssomething to escape, such as water from a pipe or airfrom a tire; but people also use the term to refer tosupposedly secret information that has been divulgedto the mass media. A keyword search on leaks is going topull up resources without distinguishing between pipesand politics. Searchers will have to wade through the re-sults and determine which documents match their needsa time-consuming process that results in searchfatigue.

    Keyword searching a lso functions poorly in searchesthat use common terms or names, since these retrievemany results and are diffi cult for the search softwareto rank by relevance. For example, searching forinformation about Los Angeles or a common namesuch as Mike Wilson will retrieve abundant results inmost systems, and many of the search results wont haveanything to do with what the searcher is looking for.Recently I needed to nd information about someone

    named Michael Ensign. But because there is an actor(a different person) with that name, most of my searchresults in Google were about the actor, since those wereranked highest by Google. This ranking caused mesearch fatigue because it required me to look through

    many results, and ultimately I was unable to nd theinformation I needed.

    Another weakness of keyword searching is itsinability to effectively search vague terms and concepts.Its diffi cult to get good search results for searches aboutlife or health because these terms are so imprecise.Searching such terms generally yields very large resultsets, sets that are often too large to sort through. Largeresult sets are one of the chief causes of search fatigue.

    Keyword searching also generally fails to pull updocuments in languages other than that of the originalsearch. For example, if you search for something us-ing a French term, most of the results will be in thatlanguage. The exceptions include documents writtenin both French and English and documents thatcontain cognates (words spelled exactly the same) inboth languages. But generally, keyword searching ismonolingual; this can be a source of search fatigue byeliminating relevant documents. A salient example isBrazil: In Portuguese, the national language of Brazil,the countrys name is spelled Brasil. So a keywordsearch for Brazil will probably exclude most of the docu-ments that originate from the country itself.

    Relevancy ranking itself can be another cause of searchfatigue. Relevancy is a computers way of ranking whatit thinks are the most relevant search results, listedin order from the top of the retrieval display. But itsdifficult for a computer to know what is most rel-evant. Moreover, different search systems use different

  • 8/14/2019 Search fatigue: finding a cure for the database blues

    4/5

    a m e r i c a n

    l i b r a r i e s

    | m a r c

    h 2 0 0 7

    algorithms to determine relevancy, so what appears atthe top in one system may not rank that high in others.

    The whole idea of relevancy started with keywordsearch engines. Before keyword search engines,

    people searched metadata-enabled search engines andhad their results ranked alphabetically. Alphabeti-cal sorting is about as natural an order as one can get,because its easy and were accustomed to it. But key- word search engines cannot sort results alphabeti-cally because they dont know what elements to base thesort on. Instead, they use relevancy ranking, which isa mysterious, inconsistent, and unnatural means of sorting search results, and a source of perpetualsearch fatigue.

    Some search interfaces are so poorly designed or soconfusing to use that the search interface itself can bea cause of search fatigue. Some search engines defaultto the Boolean or; others defau lt to the Boolean and.Moreover, poor data quality in a database, such as spelling and typographical errors, contributes to search fatiguebecause it can cause some resources not to appear in thesearch results list, rendering them virtually unndable.Data in a database is a lso often missing or incomplete. A searcher cannot nd something if it isnt there, but itmay take the searcher a fatigue-lled hour to come tothis conclusion.

    The searcher himself can also be a source of searchfatigue. A searcher may consistently misspell a searchterm, turning up only resources that contain the wronglyspelled term. The searcher may also be unfamiliar withkeyword searching and not know how to effectively use

    even the most simple search interface. A common erroramong novice searchers is to enter too broad a searchterm, such as art when they really want informationabout, for example, 19th-century French art.

    Keyword vs . meta da ta e na bledSometimes keyword searching performs well. Forinstance, if youre searching for a rare word in a largedatabase, a keyword search is probably going to be aquick and easy way to find that term. A metadata-enabled search engine is one that searches meta-data rather than full text to generate search results,

    such as an online catalog. The great advantage of

    metadata is that it compensates for all the weak-nesses of keyword searching. A controlled vo-cabulary provides consistency for subject headings,so the person searching for information about false

    teeth is referred to dentures. And every document thatcontains information about false teeth or dentures in anylanguage or by any other name is assigned the subjectheading Dentures so that they all wi ll be retrieved ina search on this topic. In this way, the search is compre-hensive, and no relevant information is excluded fromthe results.

    To better understand the strengths and weaknessesof keyword and metadata-enabled searching, it helps todivide searching into casual information-seeking andserious information-seeking. Keyword searching can be

    A s e a r ch e r c a n n o t find s ome thing if it isnt th e re ,b u t it m a y ta ke t h e s e a r ch e r a fa tigu e -fille d hour tocome to th is conclus ion .

  • 8/14/2019 Search fatigue: finding a cure for the database blues

    5/5

    amercan

    brares

    |march2007

    5 0

    adequate when a complete search result is not needed; when one or two resources, regard less of their quality,are suffi cient; and when the information isnt a crucialneed for the searcher. Keyword searching functionspoorly, however, for serious information-seeking, whichrequires highly relevant and precise results. It involvessearches that relate to scholarship in medicine, busi-ness, and other elds where exhaustive search resultsare needed that are not polluted with irrelevant data.

    Gres ha ms La wThe shift that the library world is now going throughfrom metadata-enabled searching to keyword search-ing is a case of Greshams Law in action. Greshams Law was named for Sir Thomas Gresham, a 16th-centuryeconomist. In those days, people would sometimes cut orscrape off some of the metal from coins, and Gresham ob-served that when different coins with the same face valueare in circulation, people hoarded the better coinsthatis, the ones with a higher metal contentand the lesspreferable coins with the lower metal content became farmore common. Although all the coins had the same face value, Gresham found that people kept the good ones andused the bad ones for buying and selling. Another way of stating Greshams Law is, The bad drives out the good.

    Many people think all ty pes of searches have the sameface value. Keyword searching is becoming extremelypopular and is essentially beginning the process of replacing metadata-enabled searching, such as onlinecatalogs. If this process continues, metadata-enabledsearching will become a high-priced specialty service,one that is not generally available. Keyword searching,

    with all its aws and weaknesses, wi ll dominate and be-come the only ty pe of search available. We are observing Greshams Law rsthand: Cheap and abundant keywordsearching is beginning to replace metadata-enabledsearching. The bad is driving out the good.

    Because keyword searching is so prevalent, librarianscan help searchers make the best of keyword searching byhelping them learn how to maximize this type of search. We should teach patrons that keyword searching, despiteits many aws, does have some uses and can sometimesbe an effective tool for information discovery and re-trieval, especial ly in casual information-seeking.

    However, search fatigue will certa inly become more

    common as keyword searching becomes the mainmeans of information discovery, as metadata-enabledsearch engines become fewer and fewer, and as full-textdatabases start to be measured in terabytes and peta-bytes rather than megabytes and gigabytes. Librarianscan work to preserve the high-quality searching thatmetadata and controlled vocabularies help provide. We can continue to devote resources to metadata cre-ation and to metadata-enabled search engines, both of which will be crucial for information discovery inenormous databases. But the most valuable work thatlibrarians can perform is to explain to searchers thegreat value of metadata and metadata-enabled searchengines. Perhaps by doing this we can save metadata-enabled searching from the extinction to which it isnow heading.

    Th e g r e a t a d v a n t a g e o f m e t a d a t a is t h a t itc o m p e n s a t e s for a ll the w e a k n e s s e s of ke yword

    s e a rc hin g a n d he lps e lim in a t e s e a r ch fa t igue .