Thesaurus Management Tools

Embed Size (px)


Thesaurus Management Tools. TermTree. Melissa Riesland / ASIS&T PNC 2002. Definitions. Why control?. The Great Pop vs. Soda Controversy. Source: Red Hot Chili Peppers red hot peppers red hot chilly peppers red hot chillipepper - PowerPoint PPT Presentation

Text of Thesaurus Management Tools

  • Thesaurus Management ToolsMelissa Riesland / ASIS&T PNC 2002TermTree

  • Definitions

  • Why control?The Great Pop vs. Soda Controversy Source:

  • Red Hot Chili Peppers

    red hot peppersred hot chilly peppersred hot chillipepperred hot chillie peppersred hot chilli peppersred hot chilli pepersred hot chill peppersred hot chilipeppers red hot chili pepperred hot chili pepersred hot chilired hotchilli pepperschili peppers the red hot chili peppersred chili peppers

  • RelationshipsHierarchical



  • HierarchicalParent Child OR Trees Cats NT Domestic Cats NT Persians NT Wild Cats NT Tigers

  • EquivalentPop Soda CokeSource:

    soft drinktonicsoda pop, soda watercola, cocolapepsicold drinkcarbonated beveragefizzy drinksodie, sodie pop, sody pop, sodee

  • AssociativeDogsDog houses

  • classificationA logical system for the arrangement of knowledge. ~~Lois Mai Chan

    Source: Cataloging and Classification: An Introduction, 2nd ed., McGraw-Hill, 1994.

  • classification Classes and subclasses Common and distinguishing characteristics Pre-established principles Items in a collection or entries in an index, bibliography, or catalog Access and retrieval

    ~~Online Dictionary of Library and Information Science (ODLIS)Source:

  • classificationSymbolicRepresent classes and subclasses of concepts using letters, numbers, or a combination of the two. Natural languageTraditionally represent relationships using nouns and noun phrases.

  • Controlled VocabulariesAuthority lists, gazetteers, glossariesTaxonomiesThesauriTopic maps

  • Authority Lists Christie, Agatha, 1890-1976100 1_ |a Christie, Agatha, |d 1890-1976500 1_ |w nnnc |a Mallowan, Agatha Christie, |d 1890-1976500 1_ |w nnnc |a Westmacott, Mary, |d 1890-1976Source:

  • Gazetteers Source:

  • Gazetteers

  • Glossaries

  • Taxonomies vs. ThesauriShared characteristicsTraditional relationships

    Traditional uses

    Specific to topic or collection

  • Taxonomies vs. ThesauriDistinguishing characteristicsInformation access

    Information retrieval

  • Information AccessBrowsing aka buckets

  • Information AccessNavigation

  • Information RetrievalExplicit details

  • Trees & WebsTaxonomies & ThesauriTopic Maps

  • Topic MapsTraditional

    Knowledge structures

    Information resources

  • Topic MapsFacet analysis and synthesis, 260-262, 280, 484Field tags in MARC (see Fields and subfields; Subfield codes)Form headings and subdivisions:defined, 485 in LCSH, 176, 183 in Sears, 213, 218-219Government bodies and officials, 140-141 (See also Heads of state and governments)TopicsOccurrencesAssociations

  • Topic MapsVirginia Woolf wrote To the Lighthouse6Virginia Woolf was married to Leonard Woolf6Leonard Woolf founded the Hogart Press6The Hogart Press published T.S. Eliots The Wasteland6Eliot was influenced by Ezra Pound6Etc. etc.

  • Why buy?

    Data integrityOrphansCircular referencesMisspellingsInversions

  • Why buy? TimeCross-checking & proofreadingSoftware development time

  • Why buy? CostTime = $$$In-house development = $$$

  • ChoicesChoicesAutomated or manual

    Bundled or stand-alone

    Single or multi-user

  • TasksVocabulary construction and maintenance


    Search and indexing

  • CriteriaTechnical

    Pricing and licenses


  • CriteriaDocumentation

    User experience

    Data integrity

  • CriteriaStructural


    Importing, exporting, and reports

  • Products Referenced in This PresentationMultiTes - - - - Harmony -

    For information on more software products and how to select products, see

  • The EndE-mail:


    Who am I and what do I do--taxonomy development --information retrieval

    Why this presentation--needed a tool--not enough engineering staff for development, so given go-ahead to purchase out-of-the-box software--created spreadsheet comparing characteristics of candidate products

    Plan--discuss controlled vocabulary concepts--why buy?--what to consider--product taste testing

    People just dont say or spell things the same way

    From a recent gathering of variants in our query logs

    IE Call letters

    IE Controlled Vocabularies

    Both are hierarchical (trees) and usually have associative and equivalent relationships as well Both have applications for indexing, navigation, and searchBoth typically are built with a specific topic area or collection in mind

    Based on traditional indexing concepts.--Knowledge structures: topics and relations/associations--Information Resources: occurences

    Just like SGML was originally developed for print publishing (this is a header, this is body text), topic maps originally conceived for representing indexes for complex information. Evolved into a navigational aid that encompasses the characteristics of taxonomies and thesauri, with particular utility for electronic documentation.

    Topic map sits above the occurrences. Is not built in response to a body of documents. It is a stand-alone structure to which occurrences attach. Nodes. Variety of relationships/associations available. Not limited by three traditional.Government and Politics vs. Politics and Government

    Classic Rock 1970s MusicClassic Rock

    Variety of products starting at less than $500

    Average full-time worker: --$50,000 and $100,000 per year or $4167 to $8333 per month.--Bureau of Labor Statistics National Compensation Survey: $28-$39 per hour (plus benefits, capital expenses, and other forms of compensation)

    Quickly exceed budgetCan be painful as you have to live with developing product--Automated: *little or no human intervention, usually uses rules or training sets, derives vocabulary from collection itself *Sometimes comes with its own built-in vocabulary very broad.--Manual *You do all the work *Only automatic characteristic is cross-checking references, global changes, report-generating, and sometimes spell-checking, etc.

    --Bundled *Vocabulary module as part of a larger classification/management package. *However, sometimes the vocabulary module can be purchased separately.--Stand-alone *The product does vocabulary management only.

    --Single: *Can mean only one workstation (or client) *Can mean data generally is stored on that workstation (although it can be stored on a server * means that only one user at a time can use the tool (no collision monitoring available)--multi-user *many users at one time (collisions detected and managed)Vocabulary construction and maintenance ObviousEditing, creating

    ReportingTerm usageTerm history

    Search and indexingExposed to end users for querying and browsingExposed to indexers for term assignment Candidate term suggestionTechnical*Operating system, platform*database software or off-site storage*Technical support: availability? *Who is the developer? Are there IS people on staff?Pricing and licenses*one time purchase or yearly*maintenance fees*price of new versions or other updates?*Extra services for cost? Customization? formatting and importing existing thesaurus?Acceptance*who uses it? Widely adopted? *is it a new product? Well tested? *product reviews*can you contact current users?

    Documentation*Printed?*Online? Searchable?*call center? 24/7? User experience*interface: can you look at it all day?*usability: easy to use, not needing a million clicks to accomplish a task, navigation*input style: drag and drop? All manual typing? *accessibility for disabled persons?*error and feedback messaging understandable? Cryptic?*confirmation messages before major changes? Data integrity*backup copies to roll back?*administrative access levels: read only, limit who can add and delete?

    Structural*field character limits and data types*pre-defined fields and relationship types*user defined fields and relationships?*Notation?*limit levels (depth)?*polyhierarchical or multiple relationships between terms, such as a term being synonymous to more than one preferred term?Editing*how easy to change status or relationships of a term?*deletion. Global? Is term archived or completely removed?*automatic relationship validation? spell-checking?Importing, Exporting, Reports*special import format?*mapping for heterogeneous or multilingual vocabularies*import/export formats: proprietary or standard? MARC? ASCII? XML? *report configurations: KWIC & KWOC? Alpha, Hierarchical? By dated added or last edited? By notation?*user/use statistics?