Upload
dexter-dejesus
View
30
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Languages are bridges … not barriers. ReferNet Technical Meeting 24-25 September 2009. Chiara Carlucci – CEDEFOP Library. Languages are bridges … not barriers. What it is … Why to use it … How to use it … What else. What. - PowerPoint PPT Presentation
Citation preview
Languages are bridges … not barriers
Chiara Carlucci – CEDEFOP Library
ReferNet Technical MeetingReferNet Technical Meeting
24-25 September 200924-25 September 2009
for sure there is a place for thesauri but they must change in order to continue to be of value. A true thesaurus has equivalence relationships but it also supports other
kinds of relationship and provides navigation assistance by means of scope
notes and other aids.
What
A thesaurus suggest other ways of expressing an idea which is already in the user's mind and remind the user of related ideas that might be valuable in searching.
What
It’s useful recounts some classic moments of indexation because the documents are
changing rapidly, because the habit of making the same things and leads to
repetitive behavior and not considered, because the thesaurus is to be used as a
thesaurus !
What
it must be remembered that, though a thesaurus appears to be made up of a natural language terms, it is an artificial language, a controlled vocabulary with a limited number of descriptors the meaning of each being understood through
the:– context provided by the descriptors as a whole
in a bibliographical context (as VET bib) these information provided by the whole system of
descriptors are also helped by – the title of the document– the abstract of the document
What
• Is not – a dictionary which contains definitions and
pronunciations. Unlike a dictionary, a thesaurus entry does not define words.
– a glossary which contains explanations of concepts relevant to a certain field of study or action.
– a lexicon because the lexicon of a language is its vocabulary, including its words and expressions.
– a vocabulary which is the set of words they are familiar with in a language. A vocabulary usually grows and evolves with age, and serves as a useful and fundamental tool for communication and acquiring knowledge.
What
The thesaurus is a thesaurus
With his propre Hierarchical relationships that are used to indicate terms which are narrower
and broader in scope. A "Broader Term" (BT) is a more general term, e.g. “Apparatus” is a
generalization of “Computers”. Reciprocally, a Narrower Term (NT) is a more specific term, e.g.
“Digital Computer” is a specialization of “Computer”. BT and NT are reciprocals; a
broader term necessarily implies at least one other term which is narrower. BT and NT are
used to indicate class relationships, as well as part-whole relationships.
What
With his propre Equivalency relationship that are used primarily to connect synonyms and
near-synonyms. Use (USE) and Used For (UF) indicators are used when an authorized term is
to be used for another, unauthorized, term. Reciprocally, the entry for the unauthorized term
would have a indicator "USE". Unauthorized terms are often called "entry vocabulary", "entry points", "lead-in terms", or "non-preferred terms", pointing to the authorized term (also referred to as the Preferred Term or Descriptor) that has
been chosen to stand for the concept.
The thesaurus is a thesaurus
What
The thesaurus is a thesaurus
With his propre Associative relationships that are used to connect two related terms whose relationship is neither hierarchical nor equivalent. This relationship
is described by the indicator "Related Term" (RT). Associative relationships should be applied with caution, since excessive use of RT will reduce
specificity in searches. Consider the following: if the typical user is searching with term "A", would they also want resources tagged with term "B"? If the
answer is no, then an associative relationship should not be established.
What
• To translate the concept you are looking for into key-words
• Multilingualism and standardisation are the main advantages of this powerful indexing tool covering the fields of VET
• The thesaurus is an operational tool used to retrieve documents according to their semantic content
• Thesaurus must be delivered to users to identify their information needs
• Thesaurus provides a conceptual framework for understanding reality through graphic presentations that preserve the specificity
• It presents in an unambiguous way the conceptual content of documents.
Why
• A thesaurus is fit for the digital environment to show his versatility
• Is open to the interoperability information because the thesaurus context is not only an operating environment but an organizational criterion
• It can be integrated with other tools of information retrieval
Why
ETT is used to index and represent the content of a document. It is mostly used by documentalists and
librarians to identify the concepts laid down in the text and to represent them by attributing keywords from the
thesaurus. This operation enables extracting the relevant records from a collection of bibliographic
references or from a full-text documentary database to answer the user’s query. End-users can combine ETT
descriptors in order to represent their search query. The indexation through ETT enables all documents on the same subject to be retrieved through a single query.
Why
ETT is useful for taxonomy and semantic web applications. The main role of a thesaurus is to
standardise the indexing process in order to make searches simpler, more efficient and
consistent regardless of the language of the query. It is a multilingual conceptual thesaurus
which strives to satisfy both the Community and national needs on a wide range of subjects.
Each descriptor is related to one concept in each of the languages.
Why
Another interesting option offered by ETT is the possibility for users to ask questions in one language and retrieve the answers in
different languages and this Google doesn’t do, or not yet !!
Why
Is only a term
Why
In this case the descriptor ‘transparency of qualifications’ represents a precise concept and can be able to retries many web pages, not necessarily documents, that have the descriptor in the exact form in the text
WhyIn this case ‘transparency of qualifications’ is more than a descriptor: is a concept. We can find documents relating to the subject even if: 1. the term is not within the text 2. the document is in a different language.
ETT is also used in Cedefop website for automatic categorisation or classification of
documents in websites and in Library’s reference desk to categorize user’s questions. A
simple click enables crosslingual information access to the translation of a descriptor or of the complete semantic chain of a descriptor. These advanced options open the door to many cross-
lingual applications, such as calculating document similarity across languages.
Why
Indexing with the ETT’s update version
… knowing how something is stored makes finding it easier
How
The main, word-by-word alphabetical display the most familiar since it provides a variety of information for each descriptor. The term’s main entry in the alphabetical display shows the appropriate coordination.
This includes a SN, a BT and NT, USE and UF relations, RT
But be careful … this approach is easy to understand but non so easy for end-user for example the fact that BT and NT mean that two terms are related hierarchically is obvious only to specialists !
How
Showing to the users hierarchical structures is a useful mechanism for query expansion also because …
- users with varying levels of domain knowledge make use of thesauri in different ways
- thesauri are capable of providing end-users with additional, useful terms for query formulation and expansion
How
How
A KWIC index is formed by sorting and aligning the words within an article title to allow each word (except the stop words) in titles to be searchable alphabetically in the index. It was a useful indexing method for technical manuals before computerized full text search became common. The term permuted index is another name for a KWIC index, referring to the fact that it indexes all cyclic permutations of the headings. A permutation is called a cyclic permutation if and only if it will be constructed with exactly 1 cycle A cyclic permutation is built from one or more sets of elements in cyclic order.
Indexing with the ETT’s update version
• New 465 descriptors = have added to the thesaurus since 2008 edition so you can not search previous literature using these descriptors
Oldest literature on topics represented by these terms is searchable using related descriptors.
How
• 415 Deleted descriptors = are non longer used in indexing but they may be used for searching data base entries prior to ETT’s 2008 edition
More recent literature on topics represented by these terms is searchable using related
descriptors.
How
Indexing with the ETT’s update version
How can I add the new descriptors using VET det ?
1) introduce the new descriptors (p.16-19 of ETT printed version) in the field notes preceding of the word, NEWDESCRIPTOR, and separating these with commas. i.e. Notes field: NEWDESCRIPTOR certification of learning outcomes, key competences
– If the new descriptor is a main descriptor NEWMAINDESCRIPTOR at the beginning
2) not to introduce the deleted descriptors (p. 20-22 of ETT printed version)
How
Fundamental, basic, classic indexing rules really important because VEt BIB
contains 70.000 records!!!
Index ONLY what is in the document and Index at the LEVEL of specificity of the
document
1. Statements or assumptions are not indexed
How
Fundamental indexing rules
2. Very general descriptors are not used unless the document covers a topic very broadly
3. Main descriptor cover the main focus or subject of a document
4. Other descriptors indicate less important aspects within the document
How
Fundamental indexing rules
5. ETT avoids ‘indexing up’ to a broader descriptor when an appropriate more specific exists
How
Fundamental indexing rules
• Indexing is complementary to information found in other parts of the document
(mainly title and abstract)
How
• The number of the descriptors should be proportioned with the number of pages
How
Fundamental indexing rules
• “Indexable” concepts are translated into descriptors using the thesaurus helps maintain consistency and prevents proliferation of concepts
How
Fundamental indexing rules
• Thus a single descriptor may be imprecise even ambiguous while the greater the number of descriptors used together the greater the precision
Fundamental indexing rules
How
• This world precision is used in a technical sense to mean the ratio of relevant to irrelevant documents in a retrieved set
Fundamental indexing rules
How
• The word recall is used to mean the ratio of relevant documents retrieved to those wich are relevant and not retrieved
How
Fundamental indexing rules
… for the future
Permitting the searcher to switch between navigating the thesaurus and searching
the database can only improve access an obvious way in which a thesaurus can be applied directly in retrieval is to use the
relationship as a means of expanding the search. Research, however, has shown
that these relationship must be used with caution (precision/recall)
What else …
… for the future
In general, expanding a search to include the narrower terms tends to improve recall
without great sacrifice in precision. Expanding to include broader or related terms while does improve recall typically
has a significant negative impact on precision.
What else …
… for the future
• How is it possible to remain positive about the need for continued use of thesauri ?
Because only a thesaurus can become the basis of a more extensive semantic
network that provide information not just on what terms are used in indexing but
on how they are used within the system.
What else …