35
Dictionaries for the Human Language Technologies virtual network Dr Mariëtta Alberts Focus Area Manager Standardisation and Terminology Development Pan South African Language Board (PanSALB)

Dictionaries for the Human Language Technologies virtual network

  • Upload
    maja

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

Dictionaries for the Human Language Technologies virtual network. Dr Mariëtta Alberts Focus Area Manager Standardisation and Terminology Development Pan South African Language Board (PanSALB). Outline of presentation. Introduction Reviewing Human Language Technologies Scope of HLT - PowerPoint PPT Presentation

Citation preview

Page 1: Dictionaries for the Human Language Technologies  virtual network

Dictionaries for the Human Language Technologies

virtual network

Dr Mariëtta Alberts

Focus Area Manager

Standardisation and Terminology Development

Pan South African Language Board (PanSALB)

Page 2: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

Outline of presentation• Introduction• Reviewing Human Language Technologies

– Scope of HLT– Potential of HLT– Multilingualism and HLT

• The South African HLT initiative– History of South African HLT project– National Facility– South African HLT model

• Terminology Training initiative of PanSALB• Conclusion

Page 3: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

1. Introduction

• South Africa is on the verge of establishing a Human Language Technology (HLT) Centre

• The Centre will probably be managed as a national facility

• It will provide an appropriate and sustainable virtual (or otherwise) infrastructure conducive to the development and effective management of reusable electronic text and speech resources

Page 4: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

2. Reviewing Human Language Technologies (HLT)

• Human Language Technologies are enabling technologies

• They enable human beings to interact with computers by using human language (text and speech)

Page 5: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

Human Language Technologies range from:

• high-level parsing and machine translation

• applications in education and training

• public service (e-governance and e-commerce applications)

• voice-operated educational systems

• voice-operated commercial systems that can be used by illiterate people

Page 6: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

Human Language Technologies:

• Provide interfaces that enable spoken human-machine interaction (telephone-based information systems, automated booking systems);

• Provide linguistic assistance (spelling and grammar checking)

• Provide access to multilingual polythematic information

• Empower people to actively participate in the Information Society

Page 7: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

2.1 The scope of HLT:

• Text based language processing

– Text analysis (e.g. spellcheckers, term extraction, search engines)

– Summarisation

– Text translation

• Speech processing

– Speech recognition (e.g. desktop or telephony environment)

– Speech synthesis

Page 8: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

2.2 Potential of HLT:

• Access for all to the information era

• Enhanced mother-tongue or first language teaching

• Affordable multilingual documents

• Improved functionality and quality of languages

• Contact with the developing-world context

Page 9: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

Potential of HLT...• Availability of multilingual words and

polythematic terminology: indicator of development

• Specialised communication has a central axle or hub in terminology

• Standardised terminology contributes to quality of translations, interpreting and communication

• Streamlined translation and interpreting services provide competitive advantages

Page 10: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

2.3 Multilingualism and HLT: South African situation

• South Africa has a severe illiteracy rate

• Only 22% of the citizens can function through medium of English

• A small percentage of South Africans have access to computers - fewer still are IT literate

• The divide is even greater in the rural versus urban scene

Page 11: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

• Effective e-government is necessary (i.e. birth certificates, identity documents, marriage and death certificates, telephone, electricity and water bills, traffic fines, etc.)

• All citizens should have access to information in the languages they understand best (e.g. 11 official languages; South African Sign Language; Khoe and San languages)

• Government should communicate to citizens in their own languages regarding key services (e.g. health; safety and security; education; postal services; justice (courts); banks (economy); media (electronic and print); labour (jobs); social welfare (pensions); etc.)

Page 12: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

Language Policy and Legislation

• Multilingual policy since 1994 - South African Constitution of 1996 (Act 108 of 1996)

• Mechanisms of protecting and promoting linguistic rights were put in place

• Section 6 of the South African Constitution specifically mentions the principles of language policy which takes into consideration the multilingual nature of the South African society

Page 13: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

Establishment of PanSALB• The Pan South African Language Board

(PanSALB) (Act 59 of 1995) was established:

– to develop, promote and ensure use of South Africa’s eleven official languages, South African Sign Language (SASL) and the Khoe and San languages, and

– to promote respect for other languages used in the country (e.g. heritage languages (Dutch, French,

German, Hindu, KiSwahili, Portuguese, Tamil, etc. )

Page 14: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

• PanSALB ensures the implementation of the National Language Policy Framework (NLPF) to ensure access to services to all citizens through:

• 9 Provincial Language Committees (PLCs)– Assist Provinces with language policy formulation and

implementation

• 13 National Language Bodies (NLBs)– Standardisation (e.g. spelling and orthography rules)– Terminology development– Dictionary needs (general vocabulary)– Literacy and media– Research and Education

• 11 National Lexicography Units (NLUs)– Compilation of comprehensive monolingual and other types

of dictionaries

Page 15: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

3. The South African HLT initiative3.1 History • Lexinet research programme of HSRC (1988)

(Wordnet, Termnet, Docnet, Transnet, Ailang, etc.)

• PanSALB and DACST (now DAC) initiated the HLT project in 1999

• The former Minister of DACST appointed a panel of experts to investigate the establishment of a HLT virtual network

• The HLT task team concluded that a HLT National Facility should be established

• The developers of the envisaged HLT National Facility should ensure that HLT advance multilingualism in different respects, i.e.:

Page 16: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

• Key government documents in the languages the citizens can understand best

• Electronic systems to connect lexicographers and terminologists with other language practitioners

• Electronic systems to disseminate lexicographical and terminological data

• Electronic systems to connect translators and other language workers with word and term banks

• Central government assistance to meet communication needs of all its citizens

• Local and provincial governments to serve as focal points of information dissemination (e.g. multipurpose community centres)

Page 17: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

3. The South African HLT initiative

3.2 National Facility

• Purpose of HLT project:

– to fast track the use and development of indigenous languages

– to promote the SA government’s policy of multilingualism

– to facilitate better service delivery for citizens to access or supply information in any of the official languages

Page 18: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

• Basic premises for the development of HLT:

– development and effective management of reusable text and speech resources in all official languages of SA;

– capacity building with respect to research and development in the field of HLT; and

– stimulation of an HLT industry that will provide language-based electronic products which, in turn, will be applicable in all relevant sectors, especially in the government sector.

Page 19: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

3.3 SA Human Language Technologies Model

• The South African HLT model is based on a model being implemented by the European Union (EU)

• EU model is effectively implemented in the EU Framework Programmes (FP 3/4/5/6)

• South African HLT model will grow exponentially as expertise and resources are developed

Page 20: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

Page 21: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

3.3.1 Aims of envisaged HLT virtual network

• An e-government process needs to provide citizens with:

– Access to online facilities

– Required and necessary service delivery

– Infrastructure to make it work

• Two basic prerequisites are:

– A technical infrastructure (IT access; proven and multipurpose IT systems; online language services)

– Human capital (capacity building e.g. trained and reskilled language practitioners)

Page 22: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

3.3.2 Identified needs: • Low general awareness level regarding HLT benefits

• Interdisciplinary curricula at tertiary level to advance HLT development

• Systematic presentation of short dedicated HLT courses

• Theoretical and practical training in the fields of lexicography and terminology

• Job creation should be carefully planned

• Upgrade and maintain a knowledge base on HLT

Page 23: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

3.3.3 Proposed three-step strategy for development of HLT model:

• Step 1: Applied research and capacity building, production of language resources, development of enabling technologies and of a HLT industry.

• Step 2: Development of a legal framework to ensure systematic acquisition, administration and conservation of electronic language resources.

• Step 3: Development of an infrastructure to manage the implementation of the proposed HLT model

Page 24: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

The following diagram demonstrates the various relationships:

Centre forHuman Language Technologies

Central planning, coordination &consultation

Digital Text and Speech CorporaAcquisition, enhancement, management

NLP Software development

HLT TrainingNLUP

CompanyA

UniversityD

NLUZ

UniversityC

GovtDept B

UniversityA

CompanyB

Resources and Expertise to feed into

NationalLexicographic

Units(NLUs)

GovernmentDepartments

HLT products fore-governance

e-learninge-commerce

Academicresearch anddevelopment

Private sectordevelopmentICT (HLT) job

creationsoftware dev.e-commerce

MEDIASABC

GovtDept A

Page 25: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

3.3.4 Role players

• Government services: national, provincial and local (e.g. e-government, e-learning, e-commerce, etc.)

• Parastatal institutions (e.g. PanSALB)

• Private sector

• Academia (tertiary education)

• Education (primary and secondary education)

Page 26: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

3.3.5 Progress• Parsing (Zulu and other African languages)

by Special Interest Group (SiG), African Languages Association of Southern Africa (ALASA)

• Speech recognition (Tourism: pilot booking service)

• Amalgamated Banks of South Africa (ABSA) multilingual pilot project: ATM screen prompts and telephone banking prompts in African languages (Zulu, Xhosa and South Sotho)

Page 27: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

Progress...• TISSA (Telephone Interpreting Service of South

Africa) (all ports of entry; health services; police charge offices; etc.)

• Spellcheckers: Afrikaans developed by North-West University; African Languages by University of Pretoria/North West University; future development combined effort

• Microsoft human/machine interface: combined effort re terminology development

• Afrilingo: e-learning tool for language acquisition (11 official SA languages)

Page 28: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

Progress ...• TshwaneLex: dedicated computer software program for

data capturing (lexicography)

• 11 National Lexicography Units (NLUs) of PanSALB: Monolingual dictionaries for each of the 11 official South African languages

• NLUs: Data collection and building of corpora

• NLUs: on-line dictionaries (e.g. Afrikaans, Northern Sotho

(Sesotho sa Leboa))

• TshwaneTerm: dedicated computer software program for data capturing (terminology)??

Page 29: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

Progress ...• National term bank (multilingual, polythematic):

Terminology Coordination Section (TCS) of the National Language Service (NLS), Department of Arts and Culture (DAC)

• Latin terminology: interactive multilingual e-learning project (PanSALB, CLTAL, Trydian Interactive)

• Mathematics on-line dictionary project: South African Multilingual Mathematical Lexicon (SAMML)

Page 30: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

Lexicographical and Terminological information available on HLT virtual network

• SA Government has approved the development of a human language technology (HLT) virtual network

• All lexicography and terminology endeavours to be part of HLT virtual network

• For multilingual words and terms to be available on HLT virtual network to end-users (subject specialists, students, language practitioners, general public) - dictionaries are needed!!!

Page 31: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

4. New terminology training initiative from PanSALB:

• Members of TCs, NLBs: Guidelines to verify and authenticate terms

• Skills development: Language practitioners: terminologists, lexicographers (e.g. NLUs), translators, interpreters, linguists, teachers, journalists, language students, etc.

• Skills development: subject specialists

• Reskilling: Unemployed language workers

Page 32: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

PanSALB skills development terminology training programme

TCS, NLS, DACMultilingual polythematicnational term bank

PanSALB

National Lexicography UnitsMonolingual general dictionaries

National Language BodiesVerify and authenticate terms(need terminographic guidelines)

Re-skilling ofunemployedand otherlanguageworkers

Provincial Language Committees

Subject specialists

Page 33: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

Lexicography

School for

LanguagesTerminology

Statistics

Zoology

Psychology

NLUs

NLBs

PLCs

TCSNLS

LUs

Page 34: Dictionaries for the Human Language Technologies  virtual network

Afrilex,13 - 15 July 2005, UFS, Bloemfontein

5. Conclusion:

– Development of skills

– Enhancement of South African languages

– Development of languages into functional languages

– Dissemination of multilingual polythematic (speech and text) information within the South African community

– Better communication among all citizens in different spheres of life

– Improvement of computer literacy

Page 35: Dictionaries for the Human Language Technologies  virtual network

“Utilising technology for the development of the South African languages and developing

these languages for use with Human Language Technology applications such as

spellcheckers, translation memories and speech-recognition systems will enhance the status of the indigenous languages and will result in increased job opportunities in the

language field.”

Dr Ben Ngubane (former Minister of Arts Culture Science and Technology) 2003