76
2004.11.16 - SLIDE 1 IS 202 – FALL 2004 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2004 http://www.sims.berkeley.edu/academics/courses/ is202/f04/ SIMS 202: Information Organization and Retrieval Lecture 22: Thesaurii and Metadata

2004.11.16 - SLIDE 1IS 202 – FALL 2004 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2004

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

2004.11.16 - SLIDE 1IS 202 – FALL 2004

Prof. Ray Larson & Prof. Marc Davis

UC Berkeley SIMS

Tuesday and Thursday 10:30 am - 12:00 pm

Fall 2004http://www.sims.berkeley.edu/academics/courses/is202/f04/

SIMS 202:

Information Organization

and Retrieval

Lecture 22: Thesaurii and Metadata

2004.11.16 - SLIDE 2IS 202 – FALL 2004

Lecture Overview

• Review (and expansion)– Facetted Classification– Thesaurus Design and Development

• Metadata And Markup– XML As A Metadata Lingua Franca

• Dublin Core Revisited• METS• Other Metadata schemas and protocols in XML

• Discussion

2004.11.16 - SLIDE 3IS 202 – FALL 2004

Lecture Overview

• Review (and expansion)– Facetted Classification– Thesaurus Design and Development

• Metadata And Markup– XML As A Metadata Lingua Franca

• Dublin Core Revisited• METS• Other Metadata schemas and protocols in XML

• Discussion

2004.11.16 - SLIDE 4IS 202 – FALL 2004

Indexing Languages

• An index is a systematic guide designed to indicate topics or features of documents in order to facilitate retrieval of documents or parts of documents

• An indexing language is the set of terms used in an index to represent topics or features of documents, and the rules for combining or using those terms

2004.11.16 - SLIDE 5IS 202 – FALL 2004

Controlled Vocabularies

• Vocabulary control is the attempt to provide a standardized and consistent set of terms (such as subject headings, names, classifications, etc.) with the intent of aiding the searcher in finding information

• That is, it is an attempt to provide a consistent set of descriptions for use in (or as) metadata

2004.11.16 - SLIDE 6IS 202 – FALL 2004

Hierarchical Classification

Literature

SpanishFrenchEnglish

DramaPoetryProse

18th17th16th

DramaPoetryProse

19th 18th17th16th 19th

...

... ... ...

...

Slide author: Marti Hearst

2004.11.16 - SLIDE 7IS 202 – FALL 2004

Labeled Categories for Hierarchical Classification

• LITERATURE– 100 English Literature

• 110 English Prose– English Prose 16th Century– English Prose 17th Century– English Prose 18th Century– ...

• 111 English Poetry– 121 English Poetry 16th Century– 122 English Poetry 17th Century– ...

• 112 English Drama– 130 English Drama 16th Century– …

– 200 French LiteratureSlide author: Marti Hearst

2004.11.16 - SLIDE 8IS 202 – FALL 2004

Facetted Categories

• Mutually exclusive– Non-overlapping, distinct categories

• Relational– Relations between facets, subfacets, and foci

(elements) are not restricted to hierarchical generalization-specialization relations

• Composable– Combined using grammars of order and

relation to form compound descriptions

2004.11.16 - SLIDE 9IS 202 – FALL 2004

Facetted Classification Along With Labeled Categories

• A Language– a English– b French– c Spanish

• B Genre– a Prose– b Poetry– c Drama

• C Period– a 16th Century– b 17th Century– c 18th Century– d 19th Century

• Aa English Literature

• AaBa English Prose

• AaBaCa English Prose 16th Century

• AbBbCd French Poetry 19th Century

• BbCd Drama 19th Century

Slide author: Marti Hearst

2004.11.16 - SLIDE 10IS 202 – FALL 2004

Ranganathan

• PMEST Facets– P(ersonality)

• WHO: Types of things

– M(atter)• WHAT: Constituent materials

– E(nergy)• HOW: Action or activity terms

– S(pace)• WHERE: Where things occur

– T(ime)• WHEN: When things occur

2004.11.16 - SLIDE 11IS 202 – FALL 2004

“Classical” Facet Analysis

• What is being done?– Entity– Kind– Product– By-Product

• What are its parts?– Part

• What are its properties?– Property– Material

• How is this achieved?– Process

• By what means?– Operation

• By whom?– Agent– Patient

• Where?– Space

• When?– Time

2004.11.16 - SLIDE 12IS 202 – FALL 2004

Semantic and Syntactic Relationships

• Semantic relationships– Is-A (thing/kind,

genus/species)• Mammals

– Primates

» Humans

– Has-Parts• Human

– Head

» Eyes

• Syntactic relationships– Compounds

• Wheat + harvesting = “wheat harvesting”

• Object + operation = operation on object

2004.11.16 - SLIDE 13IS 202 – FALL 2004

Facetted Classification

• Clearly distinguishes between semantic relationships and syntactic relationships– Semantic relationships

• Within a facet• Containment relations

– Syntactic relationships• Across facets• Combinatoric relations

• Have a “syntax” for syntactic combination of semantic terms

2004.11.16 - SLIDE 14IS 202 – FALL 2004

Power of Facet Combinations

• The syntactic relations of facetted classifications enable a small controlled vocabulary to produce– Many, many structured descriptions– Complex, but formally structured descriptions

using nested compound descriptions– Descriptions for things we do not have words

for

2004.11.16 - SLIDE 15IS 202 – FALL 2004

Lecture Overview

• Review (and expansion)– Facetted Classification– Thesaurus Design and Development

• Metadata And Markup– XML As A Metadata Lingua Franca

• Dublin Core Revisited• METS• Other Metadata schemas and protocols in XML

• Discussion

2004.11.16 - SLIDE 16IS 202 – FALL 2004

Types of Indexing Languages

• Uncontrolled keyword indexing

• Indexing languages– Controlled, but not structured

• Thesauri– Controlled and structured

• Classification systems– Controlled, structured, and coded

• Facetted classification systems

2004.11.16 - SLIDE 17IS 202 – FALL 2004

Thesauri

• A Thesaurus is a collection of selected vocabulary (preferred terms or descriptors) with links among synonymous, equivalent, broader, narrower and other related terms

2004.11.16 - SLIDE 18IS 202 – FALL 2004

Thesaurus Standards

• National and International Standards for Thesauri– ANSI/NISO z39.19-1994 — American National

Standard Guidelines for the Construction, Format and Management of Monolingual Thesauri

– ANSI/NISO Draft Standard Z39.4-199x — American National Standard Guidelines for Indexes in Information Retrieval

– ISO 2788 — Documentation — Guidelines for the establishment and development of monolingual thesauri

– ISO 5964 — Documentation — Guidelines for the establishment and development of multilingual thesauri

2004.11.16 - SLIDE 19IS 202 – FALL 2004

Thesaurus Examples

• Examples– The ERIC Thesaurus of Descriptors– The Medical Subject Headings (MESH) of the

National Library of Medicine– The Art and Architecture Thesaurus

2004.11.16 - SLIDE 20IS 202 – FALL 2004

ERIC Thesaurus – Entry

2004.11.16 - SLIDE 21IS 202 – FALL 2004

ERIC Thesaurus – Alphabetic

2004.11.16 - SLIDE 22IS 202 – FALL 2004

ERIC Thesaurus – KWIC Index

2004.11.16 - SLIDE 23IS 202 – FALL 2004

ERIC Thesaurus – Hierarchies

2004.11.16 - SLIDE 24IS 202 – FALL 2004

ERIC Thesaurus – Groups

2004.11.16 - SLIDE 25IS 202 – FALL 2004

ERIC Thesaurus – Online

http://www.ericfacility.net/extra/pub/thessearch.cfm

2004.11.16 - SLIDE 26IS 202 – FALL 2004

MESH – Entry

2004.11.16 - SLIDE 27IS 202 – FALL 2004

MESH – Alphabetic

2004.11.16 - SLIDE 28IS 202 – FALL 2004

MESH – Tree Structures

2004.11.16 - SLIDE 29IS 202 – FALL 2004

MESH – KWOC Index

2004.11.16 - SLIDE 30IS 202 – FALL 2004

MESH - Online

http://www.nlm.nih.gov/mesh/meshhome.html

2004.11.16 - SLIDE 31IS 202 – FALL 2004

AAT – Facets

2004.11.16 - SLIDE 32IS 202 – FALL 2004

AAT – Hierarchies (print)

2004.11.16 - SLIDE 33IS 202 – FALL 2004

AAT – Hierarchies (online)

http://www.getty.edu/research/tools/vocabulary/aat/

2004.11.16 - SLIDE 34IS 202 – FALL 2004

AAT – Entry (online)

2004.11.16 - SLIDE 35IS 202 – FALL 2004

Why Develop a Thesaurus?

• To provide a conceptual structure or “space” for a body of information– To make it possible to adequately describe

the topical content of information resources at an appropriate level of generality or specificity

– To provide enhanced search capabilities and to improve the effectiveness of searching (i.e., to retrieve most of the relevant material without too much irrelevant material)

2004.11.16 - SLIDE 36IS 202 – FALL 2004

Why Develop a Thesaurus?

• To provide vocabulary (or terminological) control– When there are several possible terms

designating a single concept, the thesaurus should lead the indexer or searcher to the appropriate concept, regardless of the terms they start with

2004.11.16 - SLIDE 37IS 202 – FALL 2004

Preliminary Considerations

• What is used now?– Continue using an existing thesaurus?– Ad hoc modification of existing thesaurus?– Develop a new well-structured thesaurus?

• What is the scope and complexity of the subject field?

• What kind of retrieval objects or data will be dealt with?

• How exhaustive and specific is the desired description of objects?

2004.11.16 - SLIDE 38IS 202 – FALL 2004

Preliminary Considerations

• The scope and complexity of the field will provide some indication of the scope and complexity of the thesaurus– It is better to plan for a larger and more

comprehensive system than a smaller system that rapidly will become inadequate as the database grows

• Development of a good thesaurus requires a major intellectual effort as well as clerical operations like data entry and production of sorted lists

2004.11.16 - SLIDE 39IS 202 – FALL 2004

Development of a Thesaurus

• Term selection

• Merging and development of concept classes

• Definition of broad subject fields and subfields

• Development of classificatory structure

• Review, testing, application, revision

2004.11.16 - SLIDE 40IS 202 – FALL 2004

Flow of Work in Thesaurus Construction

Select Sources

Assign codes

Select Terms

Record Selected Terms

Sort Terms

Merge identical Terms

Define Broad SubjectFields

Merge Terms in SameConcept class

Sort Terms into BroadSubject Fields

Define Subfields withinone Subject Field

Work out detailed structureof the Subject Field

Select Preferred Terms

All Subfields of BroadSubject finished?

All BroadSubjects finished?

Improve Class Structure

Yes

Yes

No

No

Print Classified Indexand review

Discuss with Experts andUsers

Select descriptors andchecklist items

Produce Full Thesaurusand Check references

Assign Notation

Review and Test

Many Modifications?

Based on Soergel, pp 327-333

Yes

No

Revise asneeded

2004.11.16 - SLIDE 41IS 202 – FALL 2004

1. Term Selection

• Select sources for the collection of terms– Prearranged Sources– Open-ended Sources

• Assign codes to each source

• Selection of terms– For part of pre-arranged and for all open-

ended sources

• Enter terms into database with all information

2004.11.16 - SLIDE 42IS 202 – FALL 2004

1.1 Kinds of Sources

• Prearranged Sources– Existing descriptor lists, classification schemes

thesauri• This includes universal schemes like DDC or LCSH

– Nomenclatures of single disciplines– Treatises on the terminology of a field– Encyclopedias, lexica, dictionaries and glossaries– Tables of contents of textbooks and handbooks– Indexes of journals or abstracting journals– Indexes of other publications in the field

2004.11.16 - SLIDE 43IS 202 – FALL 2004

1.1 Kinds of Sources

• Open-ended sources– Lists of search requests or interest profiles– Description of projects/activities to be served by the

information retrieval system– Discussion with specialists in the field– Sample of documents in the field

• Ask users why and how these documents relate to the field• Have documents indexed by experts in the field

– Lists of titles of documents in the field– Abstracts and reviews of documents– Your own knowledge

2004.11.16 - SLIDE 44IS 202 – FALL 2004

Selection of Sources

• Prearranged sources require less effort in gathering the material, and may already indicate some relationships between terms and concepts and relationships among terms

• Open-ended sources can reflect current terminology and may provide more complete coverage

• Choose a set of sources that are current, as complete as possible, and considered authoritative

2004.11.16 - SLIDE 45IS 202 – FALL 2004

Selection of Terms

• In open-ended sources you read through the source and pick out terms (i.e. words and phrases) that might be useful in retrieval or as references to other terms

• Alternatively, use keyword and phrase extraction software to create lists of terms and select from those

• Transfer selected terms to the recording medium (cards or database)

2004.11.16 - SLIDE 46IS 202 – FALL 2004

Work Form – Still relevant??

From Soergel, p. 399

2004.11.16 - SLIDE 47IS 202 – FALL 2004

2. Merging and Development of Concept Classes

• Sort Term DB into alphabetical order

• First Round– Merge information for identical terms, possibly

pulling info from additional sources

• Second Round– Merge synonyms or terms in the same

concept class

2004.11.16 - SLIDE 48IS 202 – FALL 2004

3. Definition of Broad Subject Fields and Subfields

• Define broad subject fields and sort terms into these broad fields

• Define subfields within each broad field and sort terms into these subfields

• Work out the detailed structure– Select preferred terms– Merge information for terms in the same concept

class• Repeat these steps

– For each subfield within a broad field– And for each broad field– Until all terms have been consolidated and preferred

terms selected

2004.11.16 - SLIDE 49IS 202 – FALL 2004

4. Development of Classificatory Structure

• Produce preliminary version of classified index and update the working database

• Improve classificatory structure

• Reality check– Produce and distribute a version of the

classified index– Distribute to users/experts

2004.11.16 - SLIDE 50IS 202 – FALL 2004

5. Final Stages

• Review

• Testing

• Application

• Revision

2004.11.16 - SLIDE 51IS 202 – FALL 2004

Review

• Discuss classified index with users/experts– Select descriptors and checklist descriptors

• Assign notational symbols

• Produce main thesaurus and indexes

2004.11.16 - SLIDE 52IS 202 – FALL 2004

Testing a Thesaurus

• Assign descriptors to a sample set of NEW documents (use enough to get an idea of any gaps in the thesaurus)

• Test retrieval using sample questions and seeing how effectively the thesaurus maps to the appropriate descriptor

2004.11.16 - SLIDE 53IS 202 – FALL 2004

Lecture Overview

• Review (and expansion)– Facetted Classification– Thesaurus Design and Development

• Metadata And Markup– XML As A Metadata Lingua Franca

• Dublin Core Revisited• METS• Other Metadata schemas and protocols in XML

• Discussion

2004.11.16 - SLIDE 54IS 202 – FALL 2004

XML as a common syntax

• XML (and SGML) provide a way of expressing the structure of documents that can be verified and validated by document processing systems

• “Documents” can be metadata structures– Such as the description of a particular

photograph in our Phone project

• XML thus provides a way of representing metadata descriptions as well as the content that they describe

2004.11.16 - SLIDE 55IS 202 – FALL 2004

XML as a common syntax

• All XML documents follow some simple rules that make them interchangeable and usable across different systems– All data and markup is in UNICODE– All elements are marked by begin and end

tags– All markup is case-sensitive– XML DTD’s and/or Schemas define the valid

structure (and sometimes content) of the documents

2004.11.16 - SLIDE 56IS 202 – FALL 2004

Dublin Core

• Review…

• Simple metadata for describing internet resources

• For “Document-Like Objects”

• 15 Elements

2004.11.16 - SLIDE 57IS 202 – FALL 2004

Dublin Core Elements

• Title• Creator• Subject• Description• Publisher• Other Contributors• Date• Resource Type

• Format• Resource Identifier• Source• Language• Relation• Coverage• Rights Management

2004.11.16 - SLIDE 58IS 202 – FALL 2004

DC XML DTD Implementation

• There have been various versions

• This one is the one recommended (required) by the Open Archives Initiative Metadata Harvesting Protocol (OAI-MHP)

• Uses XML Name Spaces• Available at

http://dublincore.org/documents/2001/09/20/dcmes-xml/

2004.11.16 - SLIDE 59IS 202 – FALL 2004

DC Element and Attribute Definitions

<!-- The elements from DCMES 1.1 -->

<!-- The name given to the resource. --> <!ELEMENT dc:title (#PCDATA)> <!ATTLIST dc:title xml:lang CDATA #IMPLIED>

<!-- An entity primarily responsible for making the content of the resource. --> <!ELEMENT dc:creator (#PCDATA)> <!ATTLIST dc:creator xml:lang CDATA #IMPLIED>

<!-- The topic of the content of the resource. --> <!ELEMENT dc:subject (#PCDATA)> <!ATTLIST dc:subject xml:lang CDATA #IMPLIED>

<!-- An account of the content of the resource. --> <!ELEMENT dc:description (#PCDATA)> <!ATTLIST dc:description xml:lang CDATA #IMPLIED>

<!-- The entity responsible for making the resource available. --> <!ELEMENT dc:publisher (#PCDATA)> <!ATTLIST dc:publisher xml:lang CDATA #IMPLIED>

<!-- An entity responsible for making contributions to the content of the resource. --> <!ELEMENT dc:contributor (#PCDATA)> <!ATTLIST dc:contributor xml:lang CDATA #IMPLIED>

<!-- A date associated with an event in the life cycle of the resource. --> <!ELEMENT dc:date (#PCDATA)> <!ATTLIST dc:date xml:lang CDATA #IMPLIED>

2004.11.16 - SLIDE 60IS 202 – FALL 2004

DC Element Definitions (cont.)

<!-- The nature or genre of the content of the resource. --> <!ELEMENT dc:type (#PCDATA)> <!ATTLIST dc:type xml:lang CDATA #IMPLIED>

<!-- The physical or digital manifestation of the resource. --> <!ELEMENT dc:format (#PCDATA)> <!ATTLIST dc:format xml:lang CDATA #IMPLIED>

<!-- An unambiguous reference to the resource within a given context. --> <!ELEMENT dc:identifier (#PCDATA)> <!ATTLIST dc:identifier xml:lang CDATA #IMPLIED> <!ATTLIST dc:identifier rdf:resource CDATA #IMPLIED>

<!-- A Reference to a resource from which the present resource is derived. --> <!ELEMENT dc:source (#PCDATA)> <!ATTLIST dc:source xml:lang CDATA #IMPLIED> <!ATTLIST dc:source rdf:resource CDATA #IMPLIED>

<!-- A language of the intellectual content of the resource. --> <!ELEMENT dc:language (#PCDATA)> <!ATTLIST dc:language xml:lang CDATA #IMPLIED>

<!-- A reference to a related resource. --> <!ELEMENT dc:relation (#PCDATA)> <!ATTLIST dc:relation xml:lang CDATA #IMPLIED> <!ATTLIST dc:relation rdf:resource CDATA #IMPLIED>

<!-- The extent or scope of the content of the resource. --> <!ELEMENT dc:coverage (#PCDATA)> <!ATTLIST dc:coverage xml:lang CDATA #IMPLIED>

<!-- Information about rights held in and over the resource. --> <!ELEMENT dc:rights (#PCDATA)> <!ATTLIST dc:rights xml:lang CDATA #IMPLIED>

2004.11.16 - SLIDE 61IS 202 – FALL 2004

A More Complex SGML DTD

<!DOCTYPE USMARC [<!-- USMARC DTD. UCB-SLIS v.0.08 --><!-- By Jerome P. McDonough, April 1, 1994 --><!ELEMENT USMARC - - (Leader, Directry, VarFlds)><!ATTLIST USMARC Material (BK|AM|CF|MP|MU|VM|SE) "BK" id CDATA #IMPLIED><!-- Author's Note: the id attribute for the USMARC element is intended to hold a unique record number for each MARC record in the local database. That is to say, it is intended ONLY as an aid in maintaining the local database of MARC records -->

<!ELEMENT Leader - O (LRL, RecStat, RecType, BibLevel, UCP, IndCount, SFCount, BaseAddr, EncLevel, DscCatFm, LinkRec, EntryMap)><!ELEMENT Directry - O (#PCDATA)><!ELEMENT VarFlds - O (VarCFlds, VarDFlds)>

<!-- Component parts of Leader --><!-- Logical Record Length --><!ELEMENT LRL - O (#PCDATA)>…etc…

2004.11.16 - SLIDE 62IS 202 – FALL 2004

More Complex DTD (cont.)

<!-- Variable Data Fields --><!ELEMENT VarDFlds - O (NumbCode, MainEnty?, Titles, EdImprnt?, PhysDesc?, Series?, Notes?, SubjAccs?, AddEnty?, LinkEnty?, SAddEnty?, HoldAltG?, Fld9XX?)>

<!-- Component Parts of Variable Data Fields --><!-- Numbers & Codes --><!ELEMENT NumbCode - O (Fld010?, Fld011?, Fld015?, Fld017*, Fld018?,

Fld019*, Fld020*, Fld022*, Fld023*, Fld024*, Fld025*, Fld027*,

Fld028*, Fld029*, Fld030*, Fld032*, Fld033*, Fld034*, Fld035*, Fld036?, Fld037*, Fld039*, Fld040?, Fld041?, Fld042?, Fld043?, Fld044?, Fld045?, Fld046?, Fld047?, Fld048*, Fld050*, Fld051*, Fld052*, Fld055*, Fld060*, Fld061*, Fld066?, Fld069*, Fld070*, Fld071*, Fld072*, Fld074*, Fld080?, Fld082*,

Fld084*, Fld086*, Fld088*, Fld090*, Fld096*)>

<!-- Main Entries --><!ELEMENT MainEnty - O (Fld100?, Fld110?, Fld111?, Fld130?)>

<!-- Titles --><!ELEMENT Titles - O (Fld210?, Fld211*, Fld212*, Fld214*, Fld222*,

Fld240?, Fld242*, Fld243?, Fld245, Fld246*, Fld247*)>

<!-- Edition, Imprint, etc. --><!ELEMENT EdImprnt - O (Fld250?, Fld254?, Fld255*, Fld256?, Fld257?, Fld260?, Fld261?, Fld262?, Fld263?, Fld265?)>

<!-- Physical Description, etc. --><!ELEMENT PhysDesc - O (Fld300*, Fld305*, Fld306?, Fld310?, Fld315?,

Fld321*, Fld340*, Fld350?, Fld351*,Fld355*, Fld357*, Fld362*)>

…etc…

2004.11.16 - SLIDE 63IS 202 – FALL 2004

Complex DTD (cont.)

<!-- Title Statement --><!ELEMENT Fld245 - O (Six?, (a|b|c|f|g|h|k|n|p|s)+)><!ATTLIST Fld245 AddEnty (No|Yes|Blank) #IMPLIED NFChars (0|1|2|3|4|5|6|7|8|9|Blnk) #IMPLIED>

…etc…

<!-- Subfield Element Declarations --><!ELEMENT a - O (#PCDATA)><!ELEMENT b - O (#PCDATA)><!ELEMENT c - O (#PCDATA)><!ELEMENT d - O (#PCDATA)>

<!ELEMENT e - O (#PCDATA)>

2004.11.16 - SLIDE 64IS 202 – FALL 2004

Example – METS

• METS – the Metadata Encoding and Transmission Standard is a new Schema intended to provide:– “a standard for encoding descriptive, administrative,

and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide Web Consortium”

• METS can be used to “wrap” complex sets of data (the actual data, with rules for encoding binary forms), the metadata describing the parts of that data, and the sequence and conditions under which the data can or should be presented or displayed

2004.11.16 - SLIDE 65IS 202 – FALL 2004

Other Protocols and Metadata Systems Using XML

• SOAP (Simple Object Access Protocol)• SRW (Search and Retrieval for the Web)• OAI-MHP (Open Archives Initiative Metadata

Harvesting Protocol)• RDF (Resource Description Framework)• MPEG-7 (more next time)• METS• ADL Gazetteer Protocol • DAV/DASL (Distributed Authoring and Versioning)• SDLIP (Simple Digital Library Interoperability

Protocol)• Also versions of MARC and other formats in XML

2004.11.16 - SLIDE 66IS 202 – FALL 2004

Lecture Overview

• Review– Types of Controlled Vocabularies– Name Authority Control

• Thesaurus Design and Development– Controlled Vocabularies for topical description– Thesaurus Design– Steps In Thesaurus Development– Indexing

• Discussion (including some from last time)

2004.11.16 - SLIDE 67IS 202 – FALL 2004

Discussion Questions

• Morgan Ames on Vickery– Though facets are a powerful tool for organizing

information, they can be very time-consuming to define.  Vickery describes the creation of facets, starting with the analysis of terms used by a user group, then the sorting of the terms into facets, the development of facets (depending on how often they're used), the arrangement of the facets, and finally, the establishment of a notation for the facets.  Could one automate some or all of the process of defining facets for a particular area - say, an online community?  If so, which parts could be automated, and how?  If not, why not - what are the limitations of automation?

2004.11.16 - SLIDE 68IS 202 – FALL 2004

Discussion Questions

• Lilia Manguy on “Thesaurus Construction”– The reading mentions thesauri being

constructed for institutions. What are some examples of institutions with specialized thesauri? Why were they deemed necessary?

2004.11.16 - SLIDE 69IS 202 – FALL 2004

Discussion Questions

• Lilia Manguy on “Thesaurus Construction”– In our field, what are some scenarios in which

a thesaurus would need to be constructed? How would you determine who would be your ‘expert’ consultants? Who would you choose?

2004.11.16 - SLIDE 70IS 202 – FALL 2004

Discussion Questions

• Lilia Manguy on “Thesaurus Construction”– Using the process outlined in the reading for

constructing a thesaurus, how would you qualify whether your thesaurus is good or bad?

2004.11.16 - SLIDE 71IS 202 – FALL 2004

Discussion Questions

• Sorry…We will come back to this in the section on Interfaces for IR…– Christine Jones on “Card Sorting”– Carrie Burgener on “Flamenco”

2004.11.16 - SLIDE 72IS 202 – FALL 2004

Discussion Questions

• Chitra Madhwacharyula on Org. of Info., Chap 3:

– Associative indexing is the concept in which items are linked together and any item can lead to access of other related information (e.g. hypertext documents). Is it possible to have efficient and usable associative indexing without the use of computers and if so how?

– How does Google use the concept of associative indexing?

2004.11.16 - SLIDE 73IS 202 – FALL 2004

Discussion Questions

• Chitra Madhwacharyula on Org. of Info., Chap 3:– In the 1930’s Vannevar Bush developed the

idea of memex, "a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility". It was based on the concept of associative indexing. How similar/dissimilar is this device to the current generation cataloging and/or retrieval systems?

2004.11.16 - SLIDE 74IS 202 – FALL 2004

Discussion Questions

• Jaime Parada on Org. of Info., Chap 5:– The fierce competition between vendors in the OPAC

and Online Index market may increase the development of new innovative technology and better systems, but it contributes to the lack of standardization in system design. How can the Z39.50 protocol help with this issue? Does an increase on standardization reduce the innovative nature of vendors and the creation of better systems?

– User-centered design may refer to "enhancing system performance to deliver better results, designing for particular users since one size does not fit all". How does user-centered design interfere with standardization?

2004.11.16 - SLIDE 75IS 202 – FALL 2004

Announcements and Next…

• Midterms Returned

• Extra Credit

• Next time– Multimedia Information Organization and

Retrieval– Readings/Discussion:

• Computational Media Aesthetics: Finding Meaning Beautiful

• The Holy Grail of Content-Based Media Analysis• Editing Out Video Editing

2004.11.16 - SLIDE 76IS 202 – FALL 2004