Thesaurus Management ToolsMelissa Riesland / ASIS&T PNC 2002TermTree
Why control?The Great Pop vs. Soda Controversy Source: http://www.ugcs.caltech.edu/~almccon/pop_soda/
Red Hot Chili Peppers
red hot peppersred hot chilly peppersred hot chillipepperred hot chillie peppersred hot chilli peppersred hot chilli pepersred hot chill peppersred hot chilipeppers red hot chili pepperred hot chili pepersred hot chilired hotchilli pepperschili peppers the red hot chili peppersred chili peppers
HierarchicalParent Child OR Trees Cats NT Domestic Cats NT Persians NT Wild Cats NT Tigers
EquivalentPop Soda CokeSource: http://www.ugcs.caltech.edu/~almccon/pop_soda/
soft drinktonicsoda pop, soda watercola, cocolapepsicold drinkcarbonated beveragefizzy drinksodie, sodie pop, sody pop, sodee
classificationA logical system for the arrangement of knowledge. ~~Lois Mai Chan
Source: Cataloging and Classification: An Introduction, 2nd ed., McGraw-Hill, 1994.
classification Classes and subclasses Common and distinguishing characteristics Pre-established principles Items in a collection or entries in an index, bibliography, or catalog Access and retrieval
~~Online Dictionary of Library and Information Science (ODLIS)Source: http://vax.wcsu.edu/library/odlis.html
classificationSymbolicRepresent classes and subclasses of concepts using letters, numbers, or a combination of the two. Natural languageTraditionally represent relationships using nouns and noun phrases.
Controlled VocabulariesAuthority lists, gazetteers, glossariesTaxonomiesThesauriTopic maps
Authority Lists Christie, Agatha, 1890-1976100 1_ |a Christie, Agatha, |d 1890-1976500 1_ |w nnnc |a Mallowan, Agatha Christie, |d 1890-1976500 1_ |w nnnc |a Westmacott, Mary, |d 1890-1976Source: http://authorities.loc.gov/
Gazetteers Source: http://www.gazetteer.de/home.htm)
Taxonomies vs. ThesauriShared characteristicsTraditional relationships
Specific to topic or collection
Taxonomies vs. ThesauriDistinguishing characteristicsInformation access
Information AccessBrowsing aka buckets
Information RetrievalExplicit details
Trees & WebsTaxonomies & ThesauriTopic Maps
Topic MapsFacet analysis and synthesis, 260-262, 280, 484Field tags in MARC (see Fields and subfields; Subfield codes)Form headings and subdivisions:defined, 485 in LCSH, 176, 183 in Sears, 213, 218-219Government bodies and officials, 140-141 (See also Heads of state and governments)TopicsOccurrencesAssociations
Topic MapsVirginia Woolf wrote To the Lighthouse6Virginia Woolf was married to Leonard Woolf6Leonard Woolf founded the Hogart Press6The Hogart Press published T.S. Eliots The Wasteland6Eliot was influenced by Ezra Pound6Etc. etc.
Data integrityOrphansCircular referencesMisspellingsInversions
Why buy? TimeCross-checking & proofreadingSoftware development time
Why buy? CostTime = $$$In-house development = $$$
ChoicesChoicesAutomated or manual
Bundled or stand-alone
Single or multi-user
TasksVocabulary construction and maintenance
Search and indexing
Pricing and licenses
Importing, exporting, and reports
Products Referenced in This PresentationMultiTes - http://www.multites.com/TermTree - http://www.termtree.com.au/index.htmlWebChoir - http://www.webchoir.com/products.htmlSynapse - http://www.synaptica.com/Data Harmony - http://www.dataharmony.com/
For information on more software products and how to select products, see http://www.willpower.demon.co.uk/thessoft.htm.
The EndE-mail: email@example.com
Who am I and what do I do--taxonomy development --information retrieval
Why this presentation--needed a tool--not enough engineering staff for development, so given go-ahead to purchase out-of-the-box software--created spreadsheet comparing characteristics of candidate products
Plan--discuss controlled vocabulary concepts--why buy?--what to consider--product taste testing
People just dont say or spell things the same way
From a recent gathering of variants in our query logs
IE Call letters
IE Controlled Vocabularies
Both are hierarchical (trees) and usually have associative and equivalent relationships as well Both have applications for indexing, navigation, and searchBoth typically are built with a specific topic area or collection in mind
Based on traditional indexing concepts.--Knowledge structures: topics and relations/associations--Information Resources: occurences
Just like SGML was originally developed for print publishing (this is a header, this is body text), topic maps originally conceived for representing indexes for complex information. Evolved into a navigational aid that encompasses the characteristics of taxonomies and thesauri, with particular utility for electronic documentation.
Topic map sits above the occurrences. Is not built in response to a body of documents. It is a stand-alone structure to which occurrences attach. Nodes. Variety of relationships/associations available. Not limited by three traditional.Government and Politics vs. Politics and Government
Classic Rock 1970s MusicClassic Rock
Variety of products starting at less than $500
Average full-time worker: --$50,000 and $100,000 per year or $4167 to $8333 per month.--Bureau of Labor Statistics National Compensation Survey: $28-$39 per hour (plus benefits, capital expenses, and other forms of compensation)
Quickly exceed budgetCan be painful as you have to live with developing product--Automated: *little or no human intervention, usually uses rules or training sets, derives vocabulary from collection itself *Sometimes comes with its own built-in vocabulary very broad.--Manual *You do all the work *Only automatic characteristic is cross-checking references, global changes, report-generating, and sometimes spell-checking, etc.
--Bundled *Vocabulary module as part of a larger classification/management package. *However, sometimes the vocabulary module can be purchased separately.--Stand-alone *The product does vocabulary management only.
--Single: *Can mean only one workstation (or client) *Can mean data generally is stored on that workstation (although it can be stored on a server * means that only one user at a time can use the tool (no collision monitoring available)--multi-user *many users at one time (collisions detected and managed)Vocabulary construction and maintenance ObviousEditing, creating
ReportingTerm usageTerm history
Search and indexingExposed to end users for querying and browsingExposed to indexers for term assignment Candidate term suggestionTechnical*Operating system, platform*database software or off-site storage*Technical support: availability? *Who is the developer? Are there IS people on staff?Pricing and licenses*one time purchase or yearly*maintenance fees*price of new versions or other updates?*Extra services for cost? Customization? formatting and importing existing thesaurus?Acceptance*who uses it? Widely adopted? *is it a new product? Well tested? *product reviews*can you contact current users?
Documentation*Printed?*Online? Searchable?*call center? 24/7? User experience*interface: can you look at it all day?*usability: easy to use, not needing a million clicks to accomplish a task, navigation*input style: drag and drop? All manual typing? *accessibility for disabled persons?*error and feedback messaging understandable? Cryptic?*confirmation messages before major changes? Data integrity*backup copies to roll back?*administrative access levels: read only, limit who can add and delete?
Structural*field character limits and data types*pre-defined fields and relationship types*user defined fields and relationships?*Notation?*limit levels (depth)?*polyhierarchical or multiple relationships between terms, such as a term being synonymous to more than one preferred term?Editing*how easy to change status or relationships of a term?*deletion. Global? Is term archived or completely removed?*automatic relationship validation? spell-checking?Importing, Exporting, Reports*special import format?*mapping for heterogeneous or multilingual vocabularies*import/export formats: proprietary or standard? MARC? ASCII? XML? *report configurations: KWIC & KWOC? Alpha, Hierarchical? By dated added or last edited? By notation?*user/use statistics?