23
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English Daniel A. Nkemleke Department of English Ecole Normale Supérieure University of YaoundeI Outline Introduction: Corpus Linguistics, history Some (main) existing corpora Development of the Corpus of Cameroon English (CCE) Corpus utility with reference to the CCE Prospect

Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English

Embed Size (px)

Citation preview

Page 1: Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English

Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008

Corpus Linguistics and Language Education: Development and Utility of the Corpus of

Cameroon English

Daniel A. Nkemleke

Department of EnglishEcole Normale Supérieure

University of YaoundeI

Outline

Introduction: Corpus Linguistics, history

Some (main) existing corpora

Development of the Corpus of Cameroon English (CCE)

Corpus utility with reference to the CCE

Prospect

Page 2: Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English

Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008

Corpus Linguistics and Language Education: Development and Utility of the Corpus of

Cameroon English

Daniel A. Nkemleke

Department of EnglishEcole Normale Supérieure

University of YaoundeI

Plan Introduction: Corpus

Linguistics, history Some (main) existing corpora Development of the Corpus of

Cameroon English (CCE) Corpus utility with reference

to the CCE Prospect

Page 3: Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English

Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008

Introduction: what is Corpus Linguistics?

The study of language based on examples of “real life“ language use, collected, stored and processed via computer

Facilitated by the advent of computer technology (1960s)

Latin: corpus (body): body of text any collection of more than one text, written or spoken

Page 4: Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English

Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008

Introduction (con’t): brief history

Before 1940s/1950s: “early corpus linguistics“ corpus-based methodology (“Primitive corpora?“)

Between 1960s and 1980s: minority of linguists continued working on corpus-based work (Quirk: SEU, Francis & Kucera: Brown corpus, Svartik: London-Lund corpus)

Computer technology: major support for CL

First African Corpus: 1989 (ICE-East Africa) (Schmied 1989)

Second African Corpus: 1992 CCE (Tiamajou 1993)/ Nigeria??

Page 5: Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English

Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008

Introduction (con’t): brief history

“Thirty years ago when this research started it was considered impossible to process texts of several million words in length.

Twenty years ago it was considered marginally possible but lunatic.

Ten years ago it was considered quite possible but still lunatic. Today it is very popular“

(Thomas/Short 1996: 4)

Page 6: Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English

Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008

Some (main) existing corpora

L1 Corpora Brown Corpus of American English Lancaster-Oslo/Bergen Corpus (LOB) London-Lund Corpus British National Corpus (BNC) Birmingham Corpus of British EnglishL2 Corpora ICE-East Africa (Kenya & Tanzania) Corpus of Cameroon English Corpus of Nigerian English ?? Kolhapur Corpus of Indian EnglishMultinational Corpus Project International Corpus of English (ICE)

Page 7: Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English

Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008

4 main characteristics of a corpus

1. Sampling & representativeness

Interest in whole variety of English

Attempts to construct a “representative” sample corpus

Which maximally represents variety

Aim: picture as accurate and reasonable as possible of a language population

Page 8: Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English

Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008

Four main characteristic of a corpus (Con‘t)

2. Finite size

Body of finite amount of words, e.g. 1,000,000

Figure determined at beginning of project

monitor corpus: constant addition of texts

Page 9: Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English

Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008

Four main characteristics of a corpus (con‘t)

3. Machine-readable form

Past: reference to printed text

Nowadays: implication, machine-redable

Few in book form (e.g. original London-Lund)

Occasionally other forms of media (microfiche, recordings)

Page 10: Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English

Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008

Four main characteristics of a corpus (con‘t)

4. Standard reference

Tacitly a corpus constitutes a standard reference

Presupposition: wide availability to other researchers

Direct comparison of results with other varieties

Page 11: Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English

Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008

Development of the Corpus of Cameroon English (CCE)

Began in 1992 with the collaboration of two British universities (Birmingham/Liverpool)

Assistance of the British council in Yaoundé

Target of a million words reached in 1994

Data use for classroom activities/research since then

2005: project benefited from a grant of the AvH

→ Goal: Further development (tagging) of the database (TU-Chemnitz)

Page 12: Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English

Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008

Objective

Provide authentic data for the description of the main features and problems inherent in the variety of English which is written in Cameroon

Provide a source of authentic material for English language teaching/learning in Cameroon

Serve as a database for comparative studies on CamE in relation to other varieties of English

Page 13: Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English

Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008

Text categories: written component

Text categories No. of texts No. of words

A: Official Press 257 126,539

B: Private Press 42 49,098

C: Novels & Short Stories 21 77,096

D: Religion 19 96,380

E: Tourism 5 26,881

F: Official letters 77 12,285

G: Private letters 250 79,386

H: Students’ Essays 83 137,399

I: Government Memos 16 71,368

J: Advertisement 10 4,875

K: Miscellaneous 22 139,247

TOTAL 802 820,554

Page 14: Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English

Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008

Text categories: spoken component

Dialogues 1. Conversations 2. Phone calls 3. Broadcast discussions 4. Classroom lessons 5. Interviews 6. Parliamentary debates 7. Legal cross- examination 8. Business transactions

Monologues 1. Commentaries 2. Demonstrations 3. Legal Presentations 4. Broadcast News 5. Broadcast Talks 6. Non-broadcast Talks

Page 15: Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English

Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008

Corpus utility with reference to CCE

13 possible ways in which a corpus may be useful 1. Corpora as a source of empirical data 2. Corpora in language teaching and learning 3. Corpora in Lexical studies 4. Corpora in grammar studies 5. Corpora in speech research 6. Corpora and semantic studies 7. Corpora in pragmatic and discourse studies 8. Corpora in sociolinguistic studies 9. Corpora and stylistic studies10. Corpora in historical linguistics11. Corpora in dialectology and variational studies12. Corpora in Psycholinguistics13. Corpora in cultural studies

Page 16: Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English

Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008

1. Corpus as a source of empirical data

Linguists can make more objective statements on language use in the variety, comparing other varieties

• Nkemleke /Mbangwana (2001)• Nkemleke (2003)• Nkemleke (2004a, 2004b)• Nkemleke (2005)• Nkemleke(2006)• Nkemleke (2007a, 2007b)• Nkemleke(fc: 2008a, 2008b, 2008c)• Schmied/Nkemleke (fc:2008a, 2008b)• A number of post-graduate projects in ENS/Faculty

Page 17: Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English

Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008

2. Corpora in language teaching/learning

CCE data used for classroom activities over the years

Page 18: Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English

Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008

Concordances : arrive _ NP (Simplification)

Page 19: Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English

Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008

Value of concordances

Support teachers’ classroom explanation

Learner’s as researchers

Data-driven learning

Critical look at existing language teaching material

Page 20: Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English

Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008

Natural data for textbook

CCE data used for studies on aspects of Cameroon English usage, E.g. Hans-Georg Wolf used data from the corpus in his book English in Cameroon, published in 2001 by Mouton de Grouter (Berlin/New York).

Page 21: Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English

Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008

3. Corpora in Lexical Studies

Keep informed about new words, changing meanings

Call up word combinations, co-occurring words

Page 22: Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English

Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008

Prospect

ICE-Cameroon is on-going

Future possibility of more specialized corpora E.g. Academic texts, Fiction

Page 23: Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English

Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008

END

Thank You!