3 Thesaurus - . Dr. Knut Hinkelmann Information Retrieval and Knowledge Organisation - 3 Thesaurus 3 Thesaurus A thesaurus is a sorted composition of terms and their

Embed Size (px)

Text of 3 Thesaurus - . Dr. Knut Hinkelmann Information Retrieval and Knowledge Organisation - 3 Thesaurus...

  • 3 Thesaurus

    Prof. Dr. Knut Hinkelmann 2Information Retrieval and Knowledge Organisation - 3 Thesaurus

    Dealing with word meanings in informationretrieval

    Problem: The same meaning can be expressed usingdifferent terms

    synonymshomonymsrelated terms

    How can it be achieved that for the same meaning theidentical terms are used in the index and the query?

  • Prof. Dr. Knut Hinkelmann 3Information Retrieval and Knowledge Organisation - 3 Thesaurus

    Thesaurus

    A thesaurus is a sorted composition of terms and their descriptors thatcan be used for indexing, storing and retrieval of information in a fieldof documentation.

    A thesaurus containstermsrelationships between terms

    Prof. Dr. Knut Hinkelmann 4Information Retrieval and Knowledge Organisation - 3 Thesaurus

    Thesaurus - Definition

    Ein Thesaurus [...] ist eine geordnete Zusammenstellung von Begriffen und ihren (vorwiegend natrlichsprachigen) Bezeichnungen, die in einem Dokumentationsgebiet zum Indexieren, Speichern und Wiederauffinden dient

    Er ist durch folgende Merkmale gekennzeichnet:Begriffe und Bezeichnungen werden eindeutig aufeinander bezogen (terminologische Kontrolle) indem

    Synonyme mglichst vollstndig erfasst werdenHomonyme und Polyseme besonders gekennzeichnet werden,fr jeden Begriff eine Bezeichung (Vorzugsbenennung, Begriffsnummer oder Notation) festgelegt wird, die den Begriff eindeutig vertritt,

    Beziehungen zwischen Begriffen (reprsentiert durch ihre Bezeichnungen) werden dargestellt.

    Quelle: DIN 1463 Erstellung und Weiterentwicklung von Thesauri

  • Prof. Dr. Knut Hinkelmann 5Information Retrieval and Knowledge Organisation - 3 Thesaurus

    Types of Thesauri

    Two kinds of thesauri can be distinguished

    Thesauri with preferred termsFrom the terms with the same or nearly the samemeaning only one is allowed for indexing. Preferredterms are also called descriptors.

    Thesauri without preferred termsTerms with similar meaning are collected in equivalence classes (sometimes called synonym setsor synsets). All terms can be used for indexing.

    preferred term = Vorzugsbezeichnung

    Prof. Dr. Knut Hinkelmann 6Information Retrieval and Knowledge Organisation - 3 Thesaurus

    Thesauri in the WebWeb Thesaurus Compendium:

    http://www.ipsi.fraunhofer.de/~lutes/thesoecd.html

    Examples:

    Thesauri with preferred termsUNESCO Thesaurus

    http://www.ulcc.ac.uk/unesco/

    Standard Thesaurus Wirtschafthttp://www.gbi.de/thesaurus/

    Thesauri without preferred termsWordnet (A lexical datebase for the English language)

    http://wordnet.princeton.edu/

    Open Thesaurushttp://www.openthesaurus.de

  • Prof. Dr. Knut Hinkelmann 7Information Retrieval and Knowledge Organisation - 3 Thesaurus

    3.1 Thesaurus with preferred terms

    Terms are represented as descriptors and non-descriptors

    DescriptorA descriptor, also called preferred term, is the term to be used to represent a concept when indexing documents and formulating queriesA descriptor contains relationships to other descriptors/terms

    Non-descriptorA non-descriptor, also called forbiddenterm, is a term designating a conceptvery close to that represented by a descriptor. It contains a reference to thecorresponding descriptor as the onlyrelationshipdescriptor

    non-descriptor

    Example: Unesco Thesaurus

    Prof. Dr. Knut Hinkelmann 8Information Retrieval and Knowledge Organisation - 3 Thesaurus

    Relationships between termsDescriptors contain relationships to other descriptors

    Hierarchical relationships, which link terms to other terms expressing more general and more specific concepts - i.e. broader terms (BT) and narrower terms (NT). Associative relationships, which link terms to similar terms (related terms) where the relationship between the terms is non-hierarchical. Related terms are indicated by the prefix RT.Equivalence relationships, which link "non-preferred" terms to synonyms or quasi-synonyms which act as "preferred" terms. Non-preferred terms are indicated by the prefix UF.

    A descriptor can contain additional informationExplanations of the intended use of the descriptorGroup (Microthesaurus) the descriptor belongs toLingustic equivalence, which designates the same concept in different languages for multilingual thesauri

  • Prof. Dr. Knut Hinkelmann 9Information Retrieval and Knowledge Organisation - 3 Thesaurus

    Relations: German and English

    Abbr.

    Hierarchy RelationsTT Top Term (allgemeinster Begriff) TT Top termOB bergeordneter Begriff (Oberbegriff) BT Broader termUB untergeordneter Begriff (Unterbegriff) NT Narrower term

    Hierarchy Relations distinguishing between Abstradtion and AggregationOA Oberbegriff Abstraktionsrelation BTG Broader term genericUA Unterbegriff Abstraktionsrelation NTG Narrower term genericSP Verbandbegriff BTP Broader term partitiveTP Teilbegriff NTP Narrower term partitive

    Equivalence Relations and AssociationsBS Benutztes Synonym oder Quasi-Synonym USE UseBF Benutzt fr Synonym oder Quasi-Synonym UF Used forVB verwandter Begriff RT Related termBK Benutzte Kombination von Einfachdeskriptoren USE UseKB Benutzt in Kombination von Einfachdeskriptoren UFC Used for combination

    German English

    Denomination Abbr. Denomination

    Quelle: DIN 1463 Erstellung und Weiterentwicklung von Thesauri

    Prof. Dr. Knut Hinkelmann 10Information Retrieval and Knowledge Organisation - 3 Thesaurus

    Equivalence Relation - Synonyms

    Semantic Equivalence is a relation between terms with (nearly) thesame meaning. It is expressed by two symbols:

    USE is used in non-descriptors and related to the correspondingdescriptor

    ExampleCarsUSE Motor vehicles

    UF (= Used For) is used in descriptors and refers to synonymous non-descriptors

    ExampleMotor vehiclesUF Cars

  • Prof. Dr. Knut Hinkelmann 11Information Retrieval and Knowledge Organisation - 3 Thesaurus

    Descriptors and Non-DescriptorsDescriptors

    may have zero, one or more non-descriptors corresponding to ithave relations to other descriptors

    Non-descriptormust refer to one descriptor only (relation USE)do not have any other relation

    Example from the UNESCO thesaurus:

    Motor vehiclesMT 6.60 Equipment and facilitiesUF AutomobilesUF CarsUF TrucksBT VehiclesRT Road EngineeringRT Road Transport

    Motor vehiclesMT 6.60 Equipment and facilitiesUF AutomobilesUF CarsUF TrucksBT VehiclesRT Road EngineeringRT Road Transport

    AutomobilesUSE Motor vehicles

    CarsUSE Motor vehicles

    Trucks USE Motor vehicles

    AutomobilesUSE Motor vehicles

    CarsUSE Motor vehicles

    Trucks USE Motor vehicles

    Descriptor: Non-Descriptors:

    Prof. Dr. Knut Hinkelmann 12Information Retrieval and Knowledge Organisation - 3 Thesaurus

    HierarchyIn general, a hierarchy is represented by two relations

    BT (= Broader Term) relates a descriptor to a more generic descriptorExample:

    BanksBT Finanical institutions

    BT2 Finance

    NT (= Narrower Term) relates a descriptor to a more specific descriptor

    Example:

    Financial institutionsNT BanksBT Finance

    In the UNESCO thesaurus, a digit to the right of the symbols BT or NT indicates the number of hierachical levels separating the descriptors

    FinanceFinance

    FinancialinstitutionsFinancial

    institutions

    BanksBanks

  • Prof. Dr. Knut Hinkelmann 13Information Retrieval and Knowledge Organisation - 3 Thesaurus

    Specific HierachiesThere are thesauri that distinguish between different types of hiearchies

    specific vs. generic terms: The narrower term is more specific than thebroader term

    Example:

    Vehicles Motor vehiclesNTG Motor vehicles BTG VehiclesNTG Bicycles

    BicyclesBTG vehicles

    partitive relation: the narrower terms is part of the broader termExample:

    Motor Vehicles EnginesNTP Engines BTP Motor Vehicles

    VehiclesVehicles

    BicyclesBicycles MotorvehiclesMotor

    vehicles

    MotorvehiclesMotor

    vehicles

    EnginesEngines

    Prof. Dr. Knut Hinkelmann 14Information Retrieval and Knowledge Organisation - 3 Thesaurus

    Association RT

    RT (= Related Term) is a relation between two descriptors that isneither hiearchical nor an equivalence relation.

    There are different kinds of relations that can be expressed as association relation, e.g.

    Descriptors that are at the same level in a hierarchyDiesel engine RT Otto engineApple RT Pear

    Descriptors that are part of a common thingSolothurn RT Aargau

    Antonym (opposite)Heat RT Cold

    Successor relationFather RT Son

    functional or causal relationBook RT Reading

  • Prof. Dr. Knut Hinkelmann 15Information Retrieval and Knowledge Organisation - 3 Thesaurus

    Structure of the Thesaurus

    The UNESCO thesaurus is organised intosubject fields and microthesauri

    Field namesA field is a grouping of microthesausA field name is preceded by a one-digit serial number

    Microthesaurus namesA microthesaurus is a grouping of descriptors and non-descriptorsA microthesaurus name is precededby a three-digit serial number, thefirst digit is the number of the subjectfield to which the microthesaurusbelongs

    Example: subject field and microthesauri

    Prof. Dr. Knut Hinkelmann 16Information Retrieval and Knowledge Organisation - 3 Thesaurus

    Other Descriptor InformationDescriptors in the UNESCO thesaurus also contain:

    ExplanationExplains the use for which a descriptoris intendedexplanations in the UNESCO thesaurus are called Scope Notes SN

    InclusionReference between a descriptor and the microthesaurus to which it belongsshown by the symbol MT

    Linguistig equivalenceRelation between descriptorsdesignatingt he same conce