Www.fit.qut.edu.au Queensland University of Technology Faculty of Information Technology Michael Middleton 1 CRICOS No. 00213J Controlled vocabularies

  • View
    214

  • Download
    1

Embed Size (px)

Text of Www.fit.qut.edu.au Queensland University of Technology Faculty of Information Technology Michael...

  • Slide 1

www.fit.qut.edu.au Queensland University of Technology Faculty of Information Technology Michael Middleton 1 CRICOS No. 00213J Controlled vocabularies : Thesauri and information retrieval Michael Middleton QUT School of Information Systems, Brisbane, Australia m.middleton@qut.edu.au for STIMULATE 5 Vrije Universiteit Brussel Brussels, Belgium July, 2005 Slide 2 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 2 CRICOS No. 00213J Introduction Context .. History Vocabulary principles Thesaurus software Thesaurus building . application Thesaurus evaluation The future Slide 3 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 3 CRICOS No. 00213J Organise to maintain Context: Information life cycle create distribute use maintain recall reuse store dispose Slide 4 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 4 CRICOS No. 00213J Context: Information management Domains Operational Analytical Strategic Slide 5 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 5 CRICOS No. 00213J Context: indexing Producing representations of records or documents that constitute a finding aid to the records in a database or to part of a document Assigned indexing Derived indexing Slide 6 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 6 CRICOS No. 00213J Indexer qualities The Art of assigned indexing: Empathy Meticulousness Consistency General knowledge Patience Slide 7 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 7 CRICOS No. 00213J Indexing guidelines Conceptual analysis and assigning Aboutness Elements of the document to consider Exhaustivity Specificity Index what is in the item Co-ordination Slide 8 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 8 CRICOS No. 00213J Assigned index representations Alphabetical Subject Classified Alphabetical Notation Chain Slide 9 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 9 CRICOS No. 00213J Indexing exercise How consistent is database indexing? Example: the same paper in multiple databases: Middleton, M Skills expectations of library graduates http://eprints.qut.edu.au/archive/00000094/ 1.Index it yourself 2.Compare your indexing with others 3.Compare the indexing in ERIC and INSPEC Slide 10 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 10 CRICOS No. 00213J Context: metadata Agent Document description Responsibility Administrative Provenance Connections Conditions of use Slide 11 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 11 CRICOS No. 00213J Context: metadata Content Topic (application of vocabulary control) Coverage Role Slide 12 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 12 CRICOS No. 00213J Controlled vocabulary Thesaurus A controlled vocabulary of terms in natural language that are designed for post-coordination Classification scheme A scheme for organisation by categories in a systematic manner; this may involve grouping by subject, function or other criteria, or determining document naming conventions Often involves notation Slide 13 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 13 CRICOS No. 00213J Purpose Indexing by translating diverse natural language to consistent terminology Establishing relationships among terms Information retrieval improving precision and recall Slide 14 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 14 CRICOS No. 00213J History Bibliographic databases Many applications, list of online associated thesauri and classification schemes at http://sky.fit.qut.edu.au/~middletm/cont_voc.html Standards ISO2788; ISO 5964 ANSI Z39.19 Slide 15 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 15 CRICOS No. 00213J Thesaurus principles Term relationships Continuing evolution Internally consistent hierarchies to support database searching Slide 16 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 16 CRICOS No. 00213J The vocabulary of a controlled indexing language formally organised so that the a priori relationships between concepts are made explicit. A thesaurus is an example of metadata The Thesaurus Slide 17 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 17 CRICOS No. 00213J Thesaurus extract (ISO sample) 35 mm CAMERAS BTMINIATURE CAMERAS CAMERAS BTOPTICAL EQUIPMENT NTMOVING PICTURE CAMERAS STEREO CAMERAS STILL CAMERAS UNDERWATER CAMERAS RTPHOTOGRAPHY CINE CAMERAS BTMOVING PICTURE CAMERAS NTUNDERWATER CINE CAMERAS RTCINEMA CINEMA RTCINE CAMERAS DIVING RTUNDERWATER CAMERAS INSTANT PICTURE CAMERAS SNCameras which produce a finished print directly BTSTILL CAMERAS Land cameras USE VIEW CAMERAS MICROSCOPES BTOPTICAL EQUIPMENT MINIATURE CAMERAS BTSTILL CAMERAS NT35 mm CAMERAS MOVING PICTURE CAMERAS BTCAMERAS NTCINE CAMERAS TELEVISION CAMERAS OPTICAL EQUIPMENT NTCAMERAS MICROSCOPES PHOTOGRAPHY RTCAMERAS Slide 18 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 18 CRICOS No. 00213J Slide 19 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 19 CRICOS No. 00213J Standardising the Vocabulary Types of entities & forms of terms Singular vs plural Homonyms Choice of terms Scope notes and history notes Slide 20 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 20 CRICOS No. 00213J Compound terms Terms should be factored into simpler elements to improve users understanding. Semantic factoring Syntactic factoring Slide 21 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 21 CRICOS No. 00213J Semantic Relationships Equivalence Establishing relationships between preferred (postable) and non-preferred (non-postable) terms Hierarchical Establishing relationships between subordinate and superordinate terms. These may be distinguished as: Generic Whole-part Instance Associative Establishing relationships between terms that are mentally associated, but not equivalent or hierarchical Slide 22 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 22 CRICOS No. 00213J but, the Functions thesaurus Whereas agenda papers might have broader term documents In a functions thesaurus agenda papers might have broader term meetings Slide 23 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 23 CRICOS No. 00213J Applying a functional thesaurus Top Term PERSONNEL Scope Notes The function of managing all employees Related Terms COMPENSATION ESTABLISHMENT INDUSTRIAL RELATIONS etc, etc Narrower Terms ALLOWANCES APPEALS (Decisions) APPOINTMENT ARRANGEMENTS AUTHORISATION COMMITTEES COMPLIANCE etc, etc Use For Terms Employees Public Servants Staff Slide 24 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 24 CRICOS No. 00213J Slide 25 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 25 CRICOS No. 00213J Thesaurus Display Alphabetical hierarchies One level above and below entry term Complete hierarchy for each term or separate TT display Permuted term lists Combination with classification notation Graphic Displays Slide 26 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 26 CRICOS No. 00213J Applying a thesaurus Download Term Tree from http://www.termtree.com.au http://www.termtree.com.au Free trial download from Slide 27 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 27 CRICOS No. 00213J Thesaurus software Assigned Integrated database Deriving terminology Slide 28 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 28 CRICOS No. 00213J Thesaurus software - assigned Terms are assigned by vocabulary specialists in independent database a.k.a.a.k.a. Synercon Management Consulting MultiTes OpenCyc SuperTHES from THESmain/THESshow for mono-/multilingual thesauri Term Tree 2000 WebChoir Wordmap Slide 29 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 29 CRICOS No. 00213J Thesaurus software integrated database Terms are assigned by specialists, thesaurus works like active data dictionary to control database BASIS InMagic Bibliotech PROBibliotech PRO BRS/Search STAR Slide 30 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 30 CRICOS No. 00213J Thesaurus software for deriving terminology Terms are created automatically from text Entrieva SemioTagger, SemioMap and SemioSkyline for viewing Intology taxonomy builder Verity Thematic Mapping Autonomy taxonomy generation & categorization Slide 31 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 31 CRICOS No. 00213J Thesaurus Building - 1 Users Define Identify needs Define Thesaurus range & depth Raw vocabulary building Identify sources Collect and record terms Slide 32 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 32 CRICOS No. 00213J Thesaurus Building -2 Vocabulary organisation Cluster terms Establish relationships using symbols Maintenance Slide 33 www.fit.qut.edu.au Queensland University of Technology FIT School of Information Systems MM 33 CRICOS No. 00213J Business application Not long term collaborative efforts of classification specialists Instead, adapt to business ch