Upload
marcus-wilkins
View
215
Download
2
Embed Size (px)
Citation preview
Aug 2-5, 2002 EMELD Workshop 2002 1
Overview & Update
Helen Aristar DryThe LINGUIST List & Eastern Michigan University
EMELD Workshop onThe Digitization of Lexical Data
Aug. 2-5, 2002
Aug 2-5, 2002 EMELD Workshop 2002 2
What Is E-Meld?
“Electronic Metastructure for Endangered Languages Data”
5 year collaborative project, begun Sept. 2001 Participants:
The LINGUIST List (Eastern Michigan U., Wayne State U., U. of Arizona)The Linguistic Data Consortium (University of Pennsylvania)The Endangered Languages Fund (Yale University, Haskins Laboratories)
Funded by NSF
Aug 2-5, 2002 EMELD Workshop 2002 3
The LINGUIST List
• 16,500 subscribers
• 106 different countries
• 4 European mirror sites:
Tübingen | Stockholm
Edinburgh | Moscow
Aug 2-5, 2002 EMELD Workshop 2002 4
…the preservation of Endangered Languages data and documentation
…the development of infrastructure for linguistic archives
To aid in …
Objectives
Aug 2-5, 2002 EMELD Workshop 2002 5
Components
Metadata server facilitating access to language resources Promulgation of best practice in:
Language identification Resource description Markup or annotation
Involvement of linguistic community in deciding best practice
Query Room, where questions can be addressed to native speakers
Demonstration project: texts and lexicons from 10 EL’s marked up according to best practice
Aug 2-5, 2002 EMELD Workshop 2002 6
Languages
Mocovi (Guaicuruan)7000 speakers [Grondona]
Biao Min (Mienic)21,000 speakers [Solnit]
Ega (Kwa) 300 speakers[Gibbon, Connell
Cambap (Mambiloid)30 speakers [Connell]
Lakota (Macro-Siouan)[Whalen]
Tofa (Turkic) [Harrison]
Two from: Alamblak, Dadibi, Mapos Buang, Takaulu Kalagan, Tuwali Ifugao - [SIL]Two from Post-Docs as yet to be determined.
Aug 2-5, 2002 EMELD Workshop 2002 7
Outreach
Workshops 2001 – Santa Barbara, CA:
focus: metadata, markup, language codes
2002 – Ann Arbor/Ypsilanti, MI focus: lexicon markup & metadata
2003, 2004: workshops 2005, 2006: “digital institutes”
Aug 2-5, 2002 EMELD Workshop 2002 8
Project Emphasis: Breadth
Widest access to informationWeb-based tools Open standardsSimple interfaces
Aug 2-5, 2002 EMELD Workshop 2002 9
2001-2 Progress
Metadata Collection: Search facility Metadata editor
Language Identification Query Room Markup Ontology (U. of
Arizona)
ORE
Ethnologue + LL Codes:used throughout LL site
OLAC Service Provider
(ELF & Rosetta)
Aug 2-5, 2002 EMELD Workshop 2002 10
Markup
Focus: morphosyntactic markup Objective: a system which allows:
Field workers to submit data in different markups Searcher to retrieve all relevant data despite
varying markups
No “gold standard” in linguistic markup Instead: ontology to serve as “interlanguage”
for translation among markups
Aug 2-5, 2002 EMELD Workshop 2002 11
Markup
Tool to translate common markup formats (RDF, Shoebox, Word) into XML
Tool to help linguist identify aspects of markup with concepts in the ontology
More on this today from Langendoen, Lewis, and Farrar
Aug 2-5, 2002 EMELD Workshop 2002 12
Data Input Tool
Web-based Potentially portable Creates database input– to be output as xml Can be customized to fit individual language More on this tomorrow from Martha Ratliff &Zhenwei Chen
Aug 2-5, 2002 EMELD Workshop 2002 13
Affiliation w/OLAC
Resource identification OLAC Service Provider
OLAC = Open Language Archives Community Part of Open Archives Initiative Multi-disciplinary initiative to promote
multi-archive searching via http protocols
Aug 2-5, 2002 EMELD Workshop 2002 14
OLAC Metadata Set
Contributor Coverage Creator Date Description Format Identifier Language
Publisher
Relation Rights Source Subject Title Type
Based on Dublin Core Set of 15 Elements
With 2 refinements
Subject.language
Type.linguistic
Type.linguistic: Draft of controlled vocabulary
Aug 2-5, 2002 EMELD Workshop 2002 15
Data Provider 2: Individual
Data Provider 3
(Archive)
OLAC Service Provider
http: GET or POST
Data Provider(Archive)
Metadata
LINGUIST List
Data Provider 2: Individual
Aug 2-5, 2002 EMELD Workshop 2002 16
On LINGUIST
OLAC Search: http://linguistlist.org/olac/ 18 archives, 30,000+ records
Metadata Editor (ORE): http://linguistlist.org/olac/ore/ Form-based editor Creates OLAC metadata in xml Makes it available to OLAC search engine
Language Lookup: http://linguistlist.org/languages