Click here to load reader

Indexing & retrieval. Approaches to indexing Key word indexing Concept indexing Social indexing Non-text indexing

  • View

  • Download

Embed Size (px)

Text of Indexing & retrieval. Approaches to indexing Key word indexing Concept indexing Social indexing...

  • Indexing & retrieval

  • Approaches to indexingKey word indexingConcept indexingSocial indexingNon-text indexing

  • Keyword Indexing

  • Keyword indexing (1) QuickAdvantages:Entity-oriented - draw terms from entity itselfHowtosucceedingraduateschool

  • Keyword indexing (1) QuickAdvantages:Entity-oriented - draw terms from entity itself Inexpensive No vocabulary lag Multiple access points Accuracy No intellectual effort needed

  • Keyword indexing (2)No control over synonyms, near synonymsDisadvantages:No control over homographs

  • Keyword indexing (3)Dependent on authors for informative and accurate titlesDisadvantages:Artificial metalloenzymes based on the biotinavidin technology: enantioselective catalysis and beyondThe golden peaches of Samarkhand

  • Keyword indexing (4)No control over word formsDisadvantages:Communicating in the libraryorCommunications in libraries

  • Keyword indexing (5)No cross reference structureDisadvantages:

  • Historical key word indexing methodologiesUniterm cardsEdge-notched cardsOptical coincidence cardsKey word in context (KWIC)Spatial indexing

  • Pre- versus post-coordinate indexingMortimer TaubeChinaFolkloreChinaHistoryChina PoliticsFrance FolkloreFrance HistoryFrance PoliticsGermany FolkloreGermany HistoryGermany PoliticsRussia FolkloreRussia HistoryRussia Politics(12 terms)China, France, Germany, Russia, Folklore, History, Politics(7 terms)

  • Post-coordinate index searchingHistory of France France + HistoryTwo sets of documentsBoolean AND search yields intersection of the two setsFranceHistoryFrance AND History

  • Advantages to Taube's systemNo need to develop a list of authorized termspulling terms from documents themselvesNo need to articulate rules of punctuation for representing complex concepts (FranceHistory)No need to delineate citation order (Francehistory v. HistoryFrance)No need to formulate rules for subheadings ("May subdivide geog.")

  • Uniterm cardsOne card per termDocument no. 102"Arrest statistics of the Arizona State Police"state31 102 53 24 75 96 107 68 49 7034 95 117 59 115 147 109police11 102 23 85 96 87 68 49 6091 115 107 79

  • Searching with uniterm cardsQuery: looking for documents about state police102Arrest statistics of the Arizona State Police.state31 102 53 24 75 96 107 68 49 7034 95 117 59 115 147 109police11 102 23 85 96 87 68 49 6091 115 107 79107A short history of the Wisconsin State Police.115The modern police state.

  • Edge-notched cardsOne card per bibliographic itembearsWhirdeaux, ImaCaring for your pet pterodactyl / by Ima Whirdeaux Call no. Q54321 .W45Turner, PaigeCaring for your pet grizzly / by Paige Turner

    Call no. Q12345 .T8pet-carepterodactyls

  • Pyramid coding for edge-notched cardsCoding the year 1947*20 dots0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 *They hadn't heard of the Y2K problem yet.10 dots9 5 2 0 9 5 2 0 8 4 1 8 4 1 7 3 7 3 6 6

  • Optical coincidence cardsPre-printed cards with numbers for entire database 0 1 2 3 4 5 6 7 8 9101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899fleas

  • Key Word in Context (KWIC) IndexDoc 15 title: "A comparison of OCLC and WLN hit rates for monographs and an analysis of the types of records retrieved"CONTEXTttems of remote users: anhit rates for monograph/Acomparison of OCLC and WLNOCLC and WLN hit rates foronographs/ A comparison ofarison of OCLC and WLN hitn analysis of the types of s of the types of recordsphs and an analysis of theA comparison of OCLC andKEY WORDSanalysis of the types ofcomparison of OCLC and WLNhit rates for monographs and /monographs and an analysi/OCLC and WLN hit rates forrates for monographs and /records retrieved. A com/retrieved. A comparison /types of records retrieve/WLN hit rates for monogra/POINTER15151515151515151515StopwordStopword

  • Key Word Out of Context (KWOC) Indexaardvark101baggage123banyan128, 159, 179coconut955, 654driving196, 488, 788elementary455, 785elephant128, 465, 783garage678, 398hardware849, 483, 399meter768nadir877noxious112opium289opus985, 159, 849people629, 458quark137, 492radar968, 295radio430, 206, 749stereo294, 837, 873television745, 727, 883ultraviolet958, 774zebra276

  • Vector space model (VSM)Each document represented by a vectortechnologylibrariesassistiveVector for document entitled "Assistive technology for libraries"

  • Vector space model matchingSimilarity between query and document vectorstechnologylibrariesassistiveVector for document 2Vector for queryVector for document 1

  • VSM term weightingAssign high weights to terms that appear frequently in the document but infrequently in the databaseQuery: "I'm looking for articles about assistive technology for the blind."


    Freq. w/indocumentlowhighhighNo. ofdocumentswith termhighhighlow

  • VSM refinementsAdding semantic and syntactical parsing.Bill is going to the store to make a purchase.Bill is going to purchase the store.Bill is going to store his purchase.

  • Concept indexing

  • Concept indexingRather than pulling terms from documents, assign concept identifier (e.g. FranceHistory) to documents dealing with history of FranceRequires intellectual effortTakes more time than key word indexing so less economicalAvoids problems of false coordination and synonymy through use of vocabulary control

  • Vocabulary control (1)One indexing term or phrase to represent a conceptUnidentified flying objects not flying saucersPoint user to correct term with "use" referenceReduces number of searches needed to find items about a particular topic

  • Vocabulary control (2)One form of a word to represent the conceptDictionaries not dictionary

  • Vocabulary control (3)One usage of a homographic termFault (geologic) not fault (responsibility for error)Usage identified though scope noteConsistency among indexers as well as one indexer over timeHelps user to avoid false drops

  • Vocabulary control (4)Syndetic structureBroader termsNarrower termsRelated terms (see also)User can negotiate structure to find most appropriate term, as well as identify additional related terms of potential use in finding relevant documents

  • Social network indexingTagsTag cloudsUser-created tags providing access to library resources

  • flickr

  • Tags

  • TagsTags architecture Bohemian South Country Czech Republic Europe European historical medieval old Old Town Other Keywords River Snow town Vltava

  • Tags

  • Tags

  • Tags(177,583 photos)

  • Tags

  • Tag clouds

  • Geotagging

  • Librarian tagging

  • Library using flickr

  • Peace Palace Library (PPL)

  • Social bookmarking:


  • economic case for open access in academic publishing

    technology Portable software for USB drivesCU Researcher Finds 10,000-Year-Old Hunting Weapon in Melting Ice Patch

  • University of Pennsylvania

  • PennTags

  • Item list with PennTags

  • Adding a PennTagAdd to PennTags

  • Non-text indexing

  • Indexing Music

  • Indexing music - transcription1 1 5 5 6 6 5

  • Indexing Music - melodic contour*RU*-/-/-\RURD

  • Query by humming

  • Query by humming (2)Hummed QueriesDigitalAudioMelodiccontourRanked ListOfMatching MelodiesPitch TrackerQuery EngineMIDI SongsMelody DatabaseSource: Ghias, Asif; Logan, Jonathan; Chamberlin, David; and Brian C. Smith. 1995. Query by humming--musical Information retrieval in an audio database. ACM Multimedia 95 - Electronic Proceedings.

  • Indexing Music - melodic contour*RURURD

  • Indexing Music - melodic contour*RURURD

  • Indexing Music - melodic contour*RURURD

  • Indexing imagesSource: Trust Territory archives.

  • Indexing images - chair (1)

  • Indexing images - ?

  • Indexing images - chair (2)

  • Biometrics - face

  • Biometrics - differences

  • Biometrics - similaritiesLook at ratios of distances between marker points

  • Indexing imagesColorLayoutShape

  • Indexing images by color

  • Indexing images by color

  • Indexing images by color

  • Indexing images by color

  • Indexing images by color

  • Indexing images by color

  • Indexing images by colorhttp://w

Search related