40
1 Language Documentation in the 21 st Century Prof Peter K. Austin Endangered Languages Academic Programme Department of Linguistics, SOAS Department of Linguistics, University of Hong Kong 13 th September 2013

Language Documentation in the 21st Century

Embed Size (px)

DESCRIPTION

Lecture given at University of Hong Kong Linguistics Department, 13 September 2013

Citation preview

Page 1: Language Documentation in the 21st Century

1

Language Documentation in the 21st Century

Prof Peter K. Austin

Endangered Languages Academic Programme

Department of Linguistics, SOAS

Department of Linguistics, University of Hong Kong

13th September 2013

Page 2: Language Documentation in the 21st Century

2

© 2013 Peter K. Austin

Creative commons licence:

Attribution-NonCommercial-NoDerivs CC BY-NC-ND

Page 3: Language Documentation in the 21st Century

3

Outline

• Language documentation in 1995 and today

• Establishing principles for the field

• Developments since 2005

• Some current challenges

• Conclusions

Page 4: Language Documentation in the 21st Century

4

Language documentation

• “concerned with the methods, tools, and theoretical underpinnings for compiling a representative and lasting multipurpose record of a natural language or one of its varieties” (Himmelmann 1998)

• has developed over the 20 years in large part in response to the urgent need to make an enduring record of the world’s many endangered languages and to support speakers of these languages in their desire to maintain them, fuelled also by developments in information and communication technologies

• essentially concerned with roles of language speakers and their rights and needs

Page 5: Language Documentation in the 21st Century

5

Publications: books and journals

• Gippert et al 2006 Essentials of Language Documentation. Mouton

• Tsunoda 2006 Language endangerment and language revitalization: an introduction

• Language Documentation and Description – 11 issues (2,000+ copies sold), 1 in prep

• Language Documentation and Conservation – 6 issues (on-line only)

• Cambridge Handbook of Endangered Languages 2011

• Routledge Essential Readings 2011

• Oxford Bibliography Online 2012

Page 6: Language Documentation in the 21st Century

6

DoBeS projects

Page 7: Language Documentation in the 21st Century

7

ELAR deposits

Page 8: Language Documentation in the 21st Century

8

Main features (Himmelmann 2006:15)

• Primary data – collection and analysis of an array of primary language data to be made available for a wide range of users;

• Accountability – access to primary data and representations of it makes evaluation of linguistic analyses possible and expected;

• Long-term storage and preservation of primary data – includes a focus on archiving in order to ensure that documentary materials are made available to potential users now and into the distant future;

Page 9: Language Documentation in the 21st Century

9

Main features (cont.)

• Interdisciplinary teams – documentation requires input and expertise from a range of disciplines and is not restricted to mainstream (“core”) linguistics alone

• Cooperation with and direct involvement of the speech community – active and collaborative work with community members both as producers of language materials and as co-researchers

• Outcome is annotated and translated corpus of archived representative materials on a language

Page 10: Language Documentation in the 21st Century

10

Stuart McGill Cicipu corpus

Page 11: Language Documentation in the 21st Century

11

Cicipu Toolbox

Page 12: Language Documentation in the 21st Century

12

Critique: Dobrin, Austin & Nathan 2007

• “subtle and pervasive kinds of commoditisation (reduction of languages to common exchange values) abound, particularly in competitive and programmatic contexts such as grant-seeking and standard-setting where languages are necessarily compared and ranked”

• archivism: quantifiable properties such as recording hours, data volume, and file parameters, and technical desiderata like ‘archival quality’ and ‘portability’ have become reference points in assessing the aims and outcomes of language documentation – these are not measures of qualitydocumentary dog

archiving tail

X

Page 13: Language Documentation in the 21st Century

13

Skills issues

• video madness: video recordings are made without reference to hypotheses, goals, or methodology, simply because the technology is available, portable and relatively inexpensive

• audio skills are lacking: documentary linguists show little or no knowledge about recording arts and microphone types, properties and placement (microphone choice and handling is the single greatest determiner of recording quality)

• corpus taming : documentary linguists show little ability at corpus and metadata management, ranging from file naming to bundle organisation

Page 14: Language Documentation in the 21st Century

14

Myopia (Austin 2012)

• ILG blindness: many documenters believe that interlinear glossing is the “gold standard” of annotation but it is very time-consuming and illegible to non-linguists – overview annotation may be a preferred as a primary goal: “roadmap” or index of a recording – approximately time-aligned information about what is in the recording, who is participating, and other interesting phenomena

• Toolbox and ELAN as “Nietsche’s typewriter” (link)

Page 15: Language Documentation in the 21st Century

15

• with no guiding framework for assessing quality, progress, and value in their work, documentary linguists fall back on established patterns, referring to quantifiable indices of language vitality or technical standards for the density of acoustic information even when these are not rationalised by the particular language or research situation

• diversity (goals, contexts, people) – move away from “Noah’s Ark” projects to more specialised documentation, eg. ELDP 2012 grant list

• we need more and better attention to goals, methods, skills, outcomes and values of language documentation

Page 16: Language Documentation in the 21st Century

16

A 21st century model

Woodbury 2011 enlarges concept of language documentation:

“creation, annotation, preservation and dissemination of transparent records of a language.”

and identifies several gaps in a Himmelmann-type approach:

“While simple in concept, it is complex and multifaceted in practice because:

• its object, language, encompasses conscious and unconscious knowledge, ideation and cognitive ability, as well as overt social behaviour;

• records of these things must draw on concepts and techniques from linguistics, ethnography, psychology, computer science, recording arts and more;

Page 17: Language Documentation in the 21st Century

17

A 21st century model

• the creation, annotation, preservation and dissemination of such records pose new challenges in all these fields, as well as information and archival sciences and;

• “above all, humans experience their own and other people’s languages viscerally and have differing stakes, purposes, goals and aspirations for language records and language documentation”

Woodbury emphasises:

• Diversity of goals, purposes and outcomes

• Need for a theory of the documentary corpus

• Need for accounts of individual project designs

Page 18: Language Documentation in the 21st Century

18

Need for meta-documentation (Austin 2013)

• meta-documentation concerns the theory and practices of meta-data, data about the data being collected and analysed

• metadata:• is needed for identification, management,

retrieval of the data• provides the context and understanding of

that data• carries those understandings into the future,

and to others (and hence is important for archiving and preservation)

• reflects knowledge and practices of data providers

Page 19: Language Documentation in the 21st Century

19

Metadata

• defines and constrains audiences and usages for the data

• all value-adding to recordings of events involves the creation of metadata – all annotations (transcriptions, translations, glosses, pos tagging, etc.) are metadata (Nathan and Austin 2004)

Page 20: Language Documentation in the 21st Century

20

Metadata gaps

• recommendations for creating metadata for language documentation have been primarily influenced by library concepts (eg. Dublin Core), and key metadata notions have been interoperability, standardisation, discovery, and access (OLAC, EMELD, Farrar & Langendoen 2003).

• the goals of language documentation mean this is not powerful enough and we need a theory of metadata, largely lacking until now

• Nathan (2010): “meta-documentation is the documentation of your data itself, and the conditions (linguistic, social, physical, technical, historical, biographical) under which it was produced. Such meta-documentation should be as rich and appropriate as the documentary materials themselves”

Page 21: Language Documentation in the 21st Century

21

Missing meta-documentation categories

• identity of stakeholders involved and their roles in the project

• attitudes and ideologies of language consultants, both towards their languages and towards the documenter and documentation project

• relationships with consultants and community

• goals and methodology of researcher, including research methods and tools (see Lüpke 2010), corpus theorisation (Woodbury 2011), theoretical assumptions embedded in annotation (abbreviations, glosses), potential for revitalisation

Page 22: Language Documentation in the 21st Century

22

• biography of the project, including background knowledge and experience of the researcher and main consultants (eg. how much fieldwork the researcher had done at the beginning of the project and under what conditions, what training the researcher and consultants had received)

• for funded projects, includes original grant application and any amendments, reports to the funder, email communications with the funder and/or any discussions with an archive (eg. reviews of sample data)

Page 23: Language Documentation in the 21st Century

23

Archiving in the 21st century

• Two major approaches have emerged

• ‘big data’ archiving

• archiving inspired by social media models

Page 24: Language Documentation in the 21st Century

24

Big data archiving

• e.g. MPI-Nijmegen• CLARIN, DARIAH, VLO• “integrated digital research environments

that allow researchers to combine resources and tools from various sources in a seamless way” (Trilsbeek & Koenig 2013)

• component metadata initiative (CIMDI)• mandatory to link each field to a concept

definition in a central data category registry called ISOcat

• goal of data mining and cross-corpus extraction, use of large scale computational linguistics tools

Page 25: Language Documentation in the 21st Century

25

Archive 2.0: social media models

• traditionally archiving focussed heavily on preservation

• however documentation often deals with highly sensitive topics (sacred stories, gossip)

• needs powerful but flexible access management

• transparency – ease of understanding• use positively – social networking model

• access through relationships• relationships and sharing produce new

opportunities• ELAR URCS system

Page 26: Language Documentation in the 21st Century

26

ELAR URCS system

• e.g. Trevor Johnson Auslan deposit

• Logged in user displays

Page 27: Language Documentation in the 21st Century

27

OAIS model

OAIS archives define three types of ‘packages’ingestion, archive, dissemination:

Archive Dissemination

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

IngestionProducers Designated communities

Page 28: Language Documentation in the 21st Century

28

ELAR archive 2.0 model

Page 29: Language Documentation in the 21st Century

29

Rethinking the archive model

• progressive archiving – a challenge to whole approach of documentary linguistics

• establish user account at beginning of project – users add and manage/update resources over time

• user accounts show access and usage/downloads analytics – cf. Academia.edu

Page 30: Language Documentation in the 21st Century

30

“classical” archiving

collect resources/data archive them

Collect, process, publish Archive

And hope that death does not intervene

progressive archiving

Page 31: Language Documentation in the 21st Century

31

Rethinking archive participation

• userse.g. add bookmarks, negotiate access

• depositorse.g. updating and editing content• negotiate access• monitoring usage

• collaborations• exchange & share information• establish groups• community curation

Page 32: Language Documentation in the 21st Century

32

User xx has just applied for access to restricted material in the deposit johnston2012auslan. The following message was attached to the application:

"Hello [depositor], xx here. I'm interested in having a look at some of your video deposit, including annotation files. I am working on a project documenting Central Australian Indigenous sign with yy (see http://iltyemiltyem.tumblr.com/). If ok, I'd like to see how you do the annotation - we have worked out a template and annotation protocol, but this needs a lot of refinement. Regards, MC"

Application: from depositor’s friend, re methods

Page 33: Language Documentation in the 21st Century

33

This email is to inform you that user xx's application for access to restricted material in the deposit kunbarlang-389 has just been approved. The depositor included the following note to the user:

"Hi xxI've approved your access to this collection, but you should know that there is an update in the material I've just deposited, with much more information on both music and texts. I'd be happy to give you access to that when it is processed.

Next time I come to London (October or November this year) I'd be happy to meet up if you would like to discuss."

Response: further info and offer to meet

Page 34: Language Documentation in the 21st Century

34

User xx has just applied for access to restricted material in the deposit cappadocian-375. The following message was attached to the application:

"Dear [depositor], I work as a research assistant in Nevsehir University in Cappadocia, Turkey. As you know, Cappadocian language has some relics in this region despite speakers of Cappadocian do not live anymore. In my university, there are few research on this subject with collaboration of Greek friends and local societies … I would like to access to your material … By the way, i would like to interview with you about Cappadocian language for our international journal of art and language. I hope you will have time for our journal . Thank you in advance."

Application: establish credentials and make request

Page 35: Language Documentation in the 21st Century

35

This email is to inform you that user xx's application for access to restricted material in the deposit johnston2012auslan has just been approved. The depositor included the following note to the user:

"I am giving you user access which means you should be able to see the ELAN eaf annotation files for the topics "The boy who cried wolf" and for "The hare and the tortoise. You should also be able to see most other movies except those tagged "1a" "4a" and "5". If you cannot see the ELAN eaf annotations I hope the problem will be fixed soon. I told the ELAR team about this."

Response: approval with details and guide

Page 36: Language Documentation in the 21st Century

36

Applied documentation

• Should documentation contribute to sustaining language and cultural diversity and the communities who want to maintain and develop them?

• What would documentary linguistics look like if it took revitalisation (and pedagogy) as its primary goal – e.g. types of data, learner-directed language, sequencing? See Nathan & Fang 2013

• Are there mismatches between linguists’ ideologies of endangered languages and documentation and community ideologies? See Austin & Sallabank 2014

Page 37: Language Documentation in the 21st Century

37

Examples

• emergence of examples of applied language documentation and language and cultural revitalisation, eg. papers in LDD 11, Wuqu’ Kawoq (from Guatemala), Maori (from New Zealand)

• this year I have been involved in a project with the Dieri Aboriginal Corporation in Australia aimed at cultural and linguistic repatriation and revival which has taught me a lot about links between primary documentation and its applications

Page 38: Language Documentation in the 21st Century

38

… it seems that in general many documenters are struggling with formal

aspects of their documentary work because of a late recognition by leaders in

documentary linguistics that a good language documentation might be very

much more than a set of dozens, hundreds, or thousands of files in

archiveable formats.” (Nathan 2012)

Page 39: Language Documentation in the 21st Century

39

Conclusions

• we need to move beyond 20th century models of language documentation and archiving and become more reflexive and analytical about our goals, practices, methods and values

• we need to bring more of the social aspect of human life into language documentation and linguistic research (where it has been largely missing for the past 20 years of renewed interest in endangered languages) replacing objectification and commodification with concern for what is special and unique about the contexts, and the people, cultures and languages we are attempting to document and support

Page 40: Language Documentation in the 21st Century

40

 唔該

Thank you!