20
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List

Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List

Embed Size (px)

Citation preview

Page 1: Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List

Jan 9, 2004Symposium on Best Practice

LSA, Boston, MA1

Metadata

Helen Aristar DryEastern Michigan University

LINGUIST List

Page 2: Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List

Jan 9, 2004Symposium on Best Practice

LSA, Boston, MA 2

Outline

What is metadata? Why use OLAC metadata? How can you write OLAC

metadata for your resources?Metadata in XMLUsing ORE

Page 3: Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List

Jan 9, 2004Symposium on Best Practice

LSA, Boston, MA 3

Preliminaries

Language documentation is valuable only if it is findable

On the Internet, this means “findable by computational means”

Efficient search and retrieval of language resources requires the use of metadata

Page 4: Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List

Jan 9, 2004Symposium on Best Practice

LSA, Boston, MA 4

Metadata is: Structured data about data Similar to catalogue information Usually consists of a set of

elements, each of which describes a property of the resource

The elements of a metadata set can be encoded in different “languages,” e.g., html, xml, rdf/xml

Page 5: Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List

Jan 9, 2004Symposium on Best Practice

LSA, Boston, MA 5

An example

Title: Biao Min Data Creator (depositor): David Solnit Subject (linguistic field): Language

Description Subject (language): Biao Min Date created: April 5, 1982 Description: The Biao Min data on the E-

MELD site includes over 3,000 lexical items. . . . .

Page 6: Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List

Jan 9, 2004Symposium on Best Practice

LSA, Boston, MA 6

Example in HTML <meta name=“DC.title“ content=“Biao Min

Data” /> <meta name=“DC.creator” content=“David

Solnit” /> <meta name=“DC.subject”

content=“Language Description” /> <meta name=“DC.subject” content=“Biao

Min” /> <meta name=“DCTERMS.created”

content=“1982-04-05” /> <meta name=“DC.description” content=“The

Biao Min data on the E-MELD site includes over 3,000 lexical items. . . . .” />

Page 7: Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List

Jan 9, 2004Symposium on Best Practice

LSA, Boston, MA 7

Example in XML

<title> Biao Min Data </title> <creator xsi:type="olac:role"

olac:code="depositor"> David Solnit </creator>

<subject xsi:type="linguistic-field" olac:code="language_description"/>

<subject xsi:type="olac:language" olac:code="x-sil-BJE"> Biao Min </subject>

Page 8: Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List

Jan 9, 2004Symposium on Best Practice

LSA, Boston, MA 8

Metadata

Different metadata specifications: MARC, METS, Dublin Core, IMDI, OLAC

IMDI & OLAC designed specifically for language documentation

Page 9: Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List

Jan 9, 2004Symposium on Best Practice

LSA, Boston, MA 9

OLAC Metadata

Product of the Open Language Archives Community

http://www.language-archives.org/

Strengths:Ease of creationSearch & retrieval via the protocols

of the Open Archives Initiative

Page 10: Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List

Jan 9, 2004Symposium on Best Practice

LSA, Boston, MA 10

Open Archives Initiative

Cross-disciplinary initiative for search and retrieval of metadata from multiple archives

Establishes protocols for “harvesting” metadata records of participating archives and making them available via “Service Providers.”

Supports formation of discipline-specific sub-communities such as OLAC (Open Language Archives Community)

Page 11: Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List

Jan 9, 2004Symposium on Best Practice

LSA, Boston, MA 11

LINGUIST List = OLAC Gateway

LINGUIST List is the main service provider for OLAC

Harvests metadata from 27 major archives

Collects metadata from individual linguists about their language documentation

Offers search interface for over 30,000 records of language-related data

See: http://linguistlist.org/olac/

Page 12: Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List

Jan 9, 2004Symposium on Best Practice

LSA, Boston, MA 12

OLAC Metadata OAI uses the Dublin Core (DC) metadata

standard 15 elements (each optional & repeatable) Core vocabulary for refining elements

(dcterms) Sub-communities may qualify DC

metadata to suit their specific needs OLAC has qualified DC metadata to better

describe language resources.

Page 13: Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List

Jan 9, 2004Symposium on Best Practice

LSA, Boston, MA 13

OLAC Qualifies 5 of the 15 DC Elements

Language Publisher Relation Rights Source Subject Title Type

Contributor Coverage Creator Date Description Format Identifier

Page 14: Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List

Jan 9, 2004Symposium on Best Practice

LSA, Boston, MA 14

OLAC recommends 5 extensions:

Language OLAC language

Subject OLAC Language Linguistic Field

Type Linguistic Data Type Discourse Type

Contributor Role

Creator Role

Page 15: Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List

Jan 9, 2004Symposium on Best Practice

LSA, Boston, MA 15

Provides a controlled vocabulary for identifying the role of a Creator or Contributor more precisely. The vocabulary identifies approximately twenty roles that are common in the development of language resources.

Examples: depositor, signer, transcriber, respondent, editor, consultant, researcher.

Documentation:

http://www.language-archives.org/REC/role.html

Participant Role

Page 16: Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List

Jan 9, 2004Symposium on Best Practice

LSA, Boston, MA 16

Language Identification:

Provides codes for identifying all known languages, both living and extinct.

Applies to: Language, Subject

Page 17: Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List

Jan 9, 2004Symposium on Best Practice

LSA, Boston, MA 17

Linguistic Field

Provides codes for identifying the content of a resource as relevant to a particular subfield of linguistic science

Applies to: Subject Examples: anthropological_linguistics ,

applied_linguistics, cognitive_science, computational_linguistics , lexicography, discourse_analysis,

Page 18: Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List

Jan 9, 2004Symposium on Best Practice

LSA, Boston, MA 18

Describes the resource as representing a recognized structural type of linguistic information

Applies to: Type Examples:

Lexicon Primary text Language description Dataset (Already in DCterms).

Linguistic Data Type

Page 19: Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List

Jan 9, 2004Symposium on Best Practice

LSA, Boston, MA 19

Discourse Type

Provides a controlled vocabulary for identifying approximately ten discourse types. It is used with Type to identify the genre of a language resource (particularly a primary text).

Types: Interactive Discourse, Report, Singing, Oratory, Narrative, Formulaic Discourse, Procedural Discourse, Language Play, Unintelligible Speech

http://www.language-archives.org/REC/discourse.html

Page 20: Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List

Jan 9, 2004Symposium on Best Practice

LSA, Boston, MA 20

See “metadata” in the E-MELD School of

Best Practices:

http://emeld.org/school/classroom/metadata

Or use the OLAC Repository Editor:

See: http://linguistlist.org/ore/

Writing metadata