Indexing & retrievalIndexing & retrieval
Approaches to indexing
Key word indexing
Concept indexing
Social indexing
Non-text indexing
Keyword Indexing
Keyword indexing (1)
• QuickAdvantages:Entity-oriented - draw terms from entity itself
How to
succeed in
graduate
school
Keyword indexing (1)
• QuickAdvantages:Entity-oriented - draw terms from entity itself
• Inexpensive• No vocabulary lag• Multiple access points• Accuracy• No intellectual effort needed
Keyword indexing (2)
• No control over synonyms, near synonyms
Disadvantages:
• No control over homographs
Keyword indexing (3)
• Dependent on authors for informative and accurate titles
Disadvantages:
Artificial metalloenzymes based on the biotin−avidin technology: enantioselective catalysis and beyond
The golden peaches of Samarkhand
Keyword indexing (4)
• No control over word forms
Disadvantages:
Communicating in the library
or
Communications in libraries
Keyword indexing (5)
• No cross reference structureDisadvantages:
Historical key word indexing methodologies
Uniterm cards
Edge-notched cards
Optical coincidence cards
Key word in context (KWIC)
Spatial indexing
Pre- versus post-coordinate indexingMortimer TaubeChina—FolkloreChina—HistoryChina —PoliticsFrance —FolkloreFrance —HistoryFrance —PoliticsGermany —FolkloreGermany —HistoryGermany —PoliticsRussia —FolkloreRussia —HistoryRussia —Politics(12 terms)
China, France, Germany, Russia, Folklore, History, Politics(7 terms)
Post-coordinate index searchingHistory of France → France + History
Two sets of documents
Boolean AND search yields intersection of the two sets
France History
France AND History
Advantages to Taube's systemNo need to develop a list of authorized terms—pulling terms from documents themselves
No need to articulate rules of punctuation for representing complex concepts (France—History)
No need to delineate citation order (France—history v. History—France)
No need to formulate rules for subheadings ("May subdivide geog.")
Uniterm cardsOne card per term
Document no. 102"Arrest statistics of the Arizona State Police"
state31 102 53 24 75 96 107 68 49 70
34 95 117 59 115 147 109
police11 102 23 85 96 87 68 49 6091 115 107 79
Searching with uniterm cardsQuery: looking for documents about state police
102 Arrest statistics of the Arizona State Police.
state31 102 53 24 75 96 107 68 49 70
34 95 117 59 115 147 109
police11 102 23 85 96 87 68 49 6091 115 107 79
107 A short history of the Wisconsin State Police.115 The modern police state.
Edge-notched cardsOne card per bibliographic item
bearsWhirdeaux, ImaCaring for your pet pterodactyl / by Ima Whirdeaux
Call no. Q54321 .W45Turner, PaigeCaring for your pet grizzly / by Paige Turner
Call no. Q12345 .T8
pet-care
pterodactyls
Pyramid coding for edge-notched cardsCoding the year 1947*20 dots
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
*They hadn't heard of the Y2K problem yet.
10 dots
9 5 2 0 9 5 2 0 8 4 1 8 4 1 7 3 7 3 6 6
0 1 2 3 4 5 6 7 8 910 11 12 13 14 15 16 17 18 1920 21 22 23 24 25 26 27 28 2930 31 32 33 34 35 36 37 38 3940 41 42 43 44 45 46 47 48 4950 51 52 53 54 55 56 57 58 5960 61 62 63 64 65 66 67 68 6970 71 72 73 74 75 76 77 78 7980 81 82 83 84 85 86 87 88 8990 91 92 93 94 95 96 97 98 99
Optical coincidence cardsPre-printed cards with numbers for entire database
fleas
Key Word in Context (KWIC) IndexDoc 15 title: "A comparison of OCLC and WLN
hit rates for monographs and an analysis of the types of records retrieved"
CONTEXTttems of remote users: anhit rates for monograph/A
comparison of OCLC and WLNOCLC and WLN hit rates for
onographs/ A comparison ofarison of OCLC and WLN hit
n analysis of the types of s of the types of records
phs and an analysis of theA comparison of OCLC and
KEY WORDSanalysis of the types ofcomparison of OCLC and WLNhit rates for monographs and /monographs and an analysi/OCLC and WLN hit rates forrates for monographs and /records retrieved. A com/retrieved. A comparison /types of records retrieve/WLN hit rates for monogra/
POINTER15151515151515151515
Stopword
Stopword
Key Word Out of Context (KWOC) Indexaardvark 101baggage 123banyan 128, 159, 179coconut 955, 654driving 196, 488, 788elementary 455, 785elephant 128, 465, 783garage 678, 398hardware 849, 483, 399meter 768nadir 877
noxious 112opium 289opus 985, 159,
849people 629, 458quark 137, 492radar 968, 295radio 430, 206,
749stereo 294, 837,
873television 745, 727,
883ultraviolet 958, 774zebra 276
Vector space model (VSM)
Each document represented by a vector
tech
no
log
y
libraries
assi
stiv
e
Vector for document entitled "Assistive technology for libraries"
Vector space model matchingSimilarity between query and document vectors
tech
no
log
y
libraries
assi
stiv
e
Vector for document 2
Vector for query
Vector for document 1
VSM term weightingAssign high weights to terms that appear frequently in the document but infrequently in the database
Query: "I'm looking for articles about assistive technology for the blind."
Termconclusioninformationblind
Freq. w/indocumentlowhighhigh
No. ofdocumentswith termhighhighlow
VSM refinementsAdding semantic and syntactical parsing.
Bill is going to the store to make a purchase.
Bill is going to purchase the store.
Bill is going to store his purchase.
Concept indexing
Concept indexingRather than pulling terms from documents, assign concept identifier (e.g. France—History) to documents dealing with history of France
Requires intellectual effort
Takes more time than key word indexing so less economicalAvoids problems of false coordination and synonymy through use of vocabulary control
Vocabulary control (1)
One indexing term or phrase to represent a concept
– Unidentified flying objects not flying saucers
– Point user to correct term with "use" reference
– Reduces number of searches needed to find items about a particular topic
Vocabulary control (2)
One form of a word to represent the concept
– Dictionaries not dictionary
Vocabulary control (3)
One usage of a homographic term
– Fault (geologic) not fault (responsibility for error)
– Usage identified though scope note– Consistency among indexers as well
as one indexer over time– Helps user to avoid false drops
Vocabulary control (4)
Syndetic structure– Broader terms– Narrower terms– Related terms (see also)– User can negotiate structure to find
most appropriate term, as well as identify additional related terms of potential use in finding relevant documents
Social network indexing
• Tags
• Tag clouds
• User-created tags providing access to library resources
flickr
http://www.flickr.com/
Tags
Tags Tags architecture Bohemian South Country Czech Republic Europe European historical medieval old Old Town Other Keywords River Snow town Vltava
Tags
Tags
Tags
(177,583 photos)
Tags
Tag clouds
Geotagging
Librarian tagging
Library using flickr
Peace Palace Library (PPL)
Social bookmarking: http://www.delicious.com
http://www.delicious.com/mauicclibrary
http://www.delicious.com/mauicclibrary
The economic case for open access in academic publishing
technology
Portable software for USB drives
CU Researcher Finds 10,000-Year-Old Hunting Weapon in Melting Ice Patch
University of Pennsylvaniahttp://www.library.upenn.edu/
PennTags
Item list with PennTags
Adding a PennTag
Add to PennTags
Non-text indexing
Indexing Music
Indexing music - transcription
1 1 5 5 6 6 5
Indexing Music - melodic contour
* R U- / - / - \
R U R D
Query by humming
Query by humming (2)
Hummed Queries
Digital Audio
Melodic contour Ranked ListOf
Matching Melodies
Pitch Tracker
Query Engine
MIDI Songs
Melody Database
Source: Ghias, Asif; Logan, Jonathan; Chamberlin, David; and Brian C. Smith. 1995. Query by humming--musical Information retrieval in an audio database. ACM Multimedia 95 - Electronic Proceedings. http://www.cs.cornell.edu/Info/Faculty/bsmith/query-by-humming.html
Indexing Music - melodic contour
* R U R U R D
http://www.musipedia.org/
Indexing Music - melodic contour
* R U R U R D
http://www.musipedia.org/
RURURD
Indexing Music - melodic contour
* R U R U R D
http://www.musipedia.org/
Indexing images
Source: Trust Territory archives.
Indexing images - chair (1)
Indexing images - ?
Indexing images - chair (2)
Biometrics - face
Biometrics - differences
Biometrics - similaritiesLook at ratios of distances between marker points
Indexing images
• Color• Layout• Shape
Indexing images by color
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English
Indexing images by color
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English
Indexing images by color
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English
Indexing images by color
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English
Indexing images by color
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English
Indexing images by color
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English
Indexing images by color
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English
Indexing images by layout
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicLayout.mac/qbic?selLang=English
Indexing images by layout
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicLayout.mac/qbic?selLang=English
Indexing images by layout
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicLayout.mac/qbic?selLang=English
Indexing images by layout
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicLayout.mac/qbic?selLang=English
Indexing images by layout
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicLayout.mac/qbic?selLang=English
Indexing images by layout
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicLayout.mac/qbic?selLang=English
Indexing images by shape
http://shape.cs.princeton.edu/search.html
Indexing images by shape
http://shape.cs.princeton.edu/search.html
Indexing images by shape
http://shape.cs.princeton.edu/search.html
Indexing images by shape
http://shape.cs.princeton.edu/search.html
Original
Search by Shape – Commercial Usage
http://www.youtube.com/watch?v=grShwnDXyUA
Search by Color Exercise
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicSearch.mac/qbic?selLang=English
Title?
Artist?
Title?
Artist?
1 2
34 5