16
Aug 2-5, 2002 EMELD Workshop 2002 1 Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization of Lexical Data Aug. 2-5, 2002

Aug 2-5, 2002 EMELD Workshop 2002 1 Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization

Embed Size (px)

Citation preview

Page 1: Aug 2-5, 2002 EMELD Workshop 2002 1 Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization

Aug 2-5, 2002 EMELD Workshop 2002 1

Overview & Update

Helen Aristar DryThe LINGUIST List & Eastern Michigan University

EMELD Workshop onThe Digitization of Lexical Data

Aug. 2-5, 2002

Page 2: Aug 2-5, 2002 EMELD Workshop 2002 1 Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization

Aug 2-5, 2002 EMELD Workshop 2002 2

What Is E-Meld?

“Electronic Metastructure for Endangered Languages Data”

5 year collaborative project, begun Sept. 2001 Participants:

The LINGUIST List (Eastern Michigan U., Wayne State U., U. of Arizona)The Linguistic Data Consortium (University of Pennsylvania)The Endangered Languages Fund (Yale University, Haskins Laboratories)

Funded by NSF

Page 3: Aug 2-5, 2002 EMELD Workshop 2002 1 Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization

Aug 2-5, 2002 EMELD Workshop 2002 3

The LINGUIST List

• 16,500 subscribers

• 106 different countries

• 4 European mirror sites:

Tübingen | Stockholm

Edinburgh | Moscow

Page 4: Aug 2-5, 2002 EMELD Workshop 2002 1 Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization

Aug 2-5, 2002 EMELD Workshop 2002 4

…the preservation of Endangered Languages data and documentation

…the development of infrastructure for linguistic archives

To aid in …

Objectives

Page 5: Aug 2-5, 2002 EMELD Workshop 2002 1 Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization

Aug 2-5, 2002 EMELD Workshop 2002 5

Components

Metadata server facilitating access to language resources Promulgation of best practice in:

Language identification Resource description Markup or annotation

Involvement of linguistic community in deciding best practice

Query Room, where questions can be addressed to native speakers

Demonstration project: texts and lexicons from 10 EL’s marked up according to best practice

Page 6: Aug 2-5, 2002 EMELD Workshop 2002 1 Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization

Aug 2-5, 2002 EMELD Workshop 2002 6

Languages

Mocovi (Guaicuruan)7000 speakers [Grondona]

Biao Min (Mienic)21,000 speakers [Solnit]

Ega (Kwa) 300 speakers[Gibbon, Connell

Cambap (Mambiloid)30 speakers [Connell]

Lakota (Macro-Siouan)[Whalen]

Tofa (Turkic) [Harrison]

Two from: Alamblak, Dadibi, Mapos Buang, Takaulu Kalagan, Tuwali Ifugao - [SIL]Two from Post-Docs as yet to be determined.

Page 7: Aug 2-5, 2002 EMELD Workshop 2002 1 Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization

Aug 2-5, 2002 EMELD Workshop 2002 7

Outreach

Workshops 2001 – Santa Barbara, CA:

focus: metadata, markup, language codes

2002 – Ann Arbor/Ypsilanti, MI focus: lexicon markup & metadata

2003, 2004: workshops 2005, 2006: “digital institutes”

Page 8: Aug 2-5, 2002 EMELD Workshop 2002 1 Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization

Aug 2-5, 2002 EMELD Workshop 2002 8

Project Emphasis: Breadth

Widest access to informationWeb-based tools Open standardsSimple interfaces

Page 9: Aug 2-5, 2002 EMELD Workshop 2002 1 Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization

Aug 2-5, 2002 EMELD Workshop 2002 9

2001-2 Progress

Metadata Collection: Search facility Metadata editor

Language Identification Query Room Markup Ontology (U. of

Arizona)

ORE

Ethnologue + LL Codes:used throughout LL site

OLAC Service Provider

(ELF & Rosetta)

Page 10: Aug 2-5, 2002 EMELD Workshop 2002 1 Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization

Aug 2-5, 2002 EMELD Workshop 2002 10

Markup

Focus: morphosyntactic markup Objective: a system which allows:

Field workers to submit data in different markups Searcher to retrieve all relevant data despite

varying markups

No “gold standard” in linguistic markup Instead: ontology to serve as “interlanguage”

for translation among markups

Page 11: Aug 2-5, 2002 EMELD Workshop 2002 1 Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization

Aug 2-5, 2002 EMELD Workshop 2002 11

Markup

Tool to translate common markup formats (RDF, Shoebox, Word) into XML

Tool to help linguist identify aspects of markup with concepts in the ontology

More on this today from Langendoen, Lewis, and Farrar

Page 12: Aug 2-5, 2002 EMELD Workshop 2002 1 Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization

Aug 2-5, 2002 EMELD Workshop 2002 12

Data Input Tool

Web-based Potentially portable Creates database input– to be output as xml Can be customized to fit individual language More on this tomorrow from Martha Ratliff &Zhenwei Chen

Page 13: Aug 2-5, 2002 EMELD Workshop 2002 1 Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization

Aug 2-5, 2002 EMELD Workshop 2002 13

Affiliation w/OLAC

Resource identification OLAC Service Provider

OLAC = Open Language Archives Community Part of Open Archives Initiative Multi-disciplinary initiative to promote

multi-archive searching via http protocols

Page 14: Aug 2-5, 2002 EMELD Workshop 2002 1 Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization

Aug 2-5, 2002 EMELD Workshop 2002 14

OLAC Metadata Set

Contributor Coverage Creator Date Description Format Identifier Language

Publisher

Relation Rights Source Subject Title Type

Based on Dublin Core Set of 15 Elements

With 2 refinements

Subject.language

Type.linguistic

Type.linguistic: Draft of controlled vocabulary

Page 15: Aug 2-5, 2002 EMELD Workshop 2002 1 Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization

Aug 2-5, 2002 EMELD Workshop 2002 15

Data Provider 2: Individual

Data Provider 3

(Archive)

OLAC Service Provider

http: GET or POST

Data Provider(Archive)

Metadata

LINGUIST List

Data Provider 2: Individual

Page 16: Aug 2-5, 2002 EMELD Workshop 2002 1 Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization

Aug 2-5, 2002 EMELD Workshop 2002 16

On LINGUIST

OLAC Search: http://linguistlist.org/olac/ 18 archives, 30,000+ records

Metadata Editor (ORE): http://linguistlist.org/olac/ore/ Form-based editor Creates OLAC metadata in xml Makes it available to OLAC search engine

Language Lookup: http://linguistlist.org/languages