Upload
sara-holmes
View
218
Download
2
Tags:
Embed Size (px)
Citation preview
Towards
a Language-Independent
Universal Digital Library
Sameh Alansary Magdy Nagi Noha Adly [email protected] [email protected] [email protected]
Bibliotheca Alexandrina
The Second International Conference on Universal Digital Libraries (ICUDL 2006) 17-19-2006 November, Alexandria, Egypt
Introduction
• IT made the full text libraries’ assets available digitally (Independent of time, place and copy).
UDL
• Digitization only does not lead to “universality” in its optimum sense.
• A new dimension of universality should be added: Independency of Language
- Nasser Digital Library.
e.g. - Million Book Project.
Language-dependency blocks information dissemination
• Language dependency holds language barriers.
• If it is always possible for everyone to read in everyone’s mother tongue, this will help in:
• 80% of books and e-materials is written in English and 20% is written in other languages.
- Dissemination of knowledge. - Preservation of nationality and identity. - Preventing cultural hegemony.
Approaches:
1- Direct translation approach.2- Transfer approach.
3- Interlingual approach.
• Translation systems have been introduced (NLP):
Attempts to break language barriers
Examples of Systems:
- Google translation: http://www.google.ch/language_tools
- Fujitsu systems:
http://www.fujitsu.com/global/services/translation
Drawback of MT systems
1- The quality of results is often inadequate.
2- Work for a limited number of language combinations.
3- Hold an overload on the network:
To translate from and to only 10 languages, 10 grammars, 10 lexicons, 90 translation dictionaries and 90 sets of translation rules will be needed, plus the need for semantic processing in each language.
Towards a universal system for knowledge
representation
• How can we represent natural language materials in a language independent format? (a format required)
• What is the system suitable for representing knowledge in the format selected? (a system required)
• How is this system going to work?
Some questions may bear in mind:
1- The content of the original material (meaning) must not be lost.
2- This universal format should be understandable by various platforms over the network.
3- This universal format should be decodable to any natural language.
Requirements for a universal representation of knowledge:
UNL System
• The Universal Networking Language (UNL) is an artificial language for computers to express information and
knowledge that can be expressed in natural language.
What is UNL? (1)
• Started in 1996, as an initiative of the UNU/IAS in Japan
• R&D in UNL
- Development on 15 languages: Arabic, Chinese, English, French, German, Hindi, Indonesian, Italian,
Japanese, Korean, Portuguese, Russian, Spanish, Thai, Swahili.
- Transferred to the UNDL Foundation in 2001.
What is UNL? (2)
• It expresses information or knowledge of natural language (NL) in the form of semantic network with hyper-node.
The boy who works here went to school
{UNL}agt(go(icl>move).@entry.@past, :01)plt(go(icl>occur).@entry.@past, school(icl>institution))agt:01(work(icl>do), boy(icl>person.@entry))plc:01(work(icl>do),here){/UNL}
UNL expression:
Example:
The boy who works here went to schoolThe boy who works here went to school
plt
agt
school(icl>institution)
go(icl>move)@ entry @ past
boy(icl>person)@ entry
work(icl>do)
hereagt
plc
:01
UNL-hyper graph
The UNL System
System Formalism
Components Knowledge representation
The UNL-system componentsUNL LANGUAGE SERVER
Enconverter = Deconverter(EnCO) (EnCO)
Language Server
UNL <-> Hindi
Internet
UNL Proxy
Language Server UNL <- >Japanese
Language ServerUNL <- >Chinese
DeCO
UNL documentLanguage Server
UNL <-> Arabic
Language Server
UNL <-> Spanish
USER
EnCO
DeCOEnCOLanguage Server UNL <- > English
DeCOEnCODeCOEnCO
UNL Editor
1 2 3
UNL Viewer
UNL Language
Server
Web Server with UNL document
UNL
NL
A) Language servers:A) Language servers:
Natural Language
UNL
EnConverter
DeConverter
UNL-language Dictionary
Knowledge Base
ConcurrenceDictionary
GenerationRules
AnalysisRules
B) UNL Tools:
1- UNL viewer.
2- UNL editor.
3- UNL verifier.
C) UNL Proxy Server:
• Searches for UNL at the web, send it to the language Searches for UNL at the web, send it to the language server and displays it on the user’s chosen language.server and displays it on the user’s chosen language.
NaturalLanguage
texts
Annotation EditorAnnotated
NaturalLanguage
texts
Universal Parser
UNL Verifier
UNLKB
Web serverHTML+XML
UWDictionary
GrammaticalRules
WordDictionary
Co-OccurrenceDictionary
EnConverterUNL
Document
UNLDocumentDeConverter
NaturalLanguage
texts
Mechanism of conversion between NL and UNL
UNL as a formal language: How does it represent knowledge?
1- Universal words (UW): to represent concepts.
Example: boy(icl>person)
hear(icl>perceive(agt>person,obj>thing))
2- relations: 38 semantic relations can be distinguished.
Example: agt, aoj, bas, con, coo, dur, … etc.
3- Attributes: to express subjectivity of the speaker.
Example: @past, @emphasis, @def, @not, … etc.
4- Knowledge base (UNLKB).
• Define the Universal Word.
• Provide linguistic knowledge of concepts
Ibrahim Shihata UNL Arabic Center (ISUAC)
• It is established at Bibliotheca Alexandrina.
• It is responsible for designing, implementing, and maintaining the various components of the Arabic language server.
• The Arabic language server will be capable of:
- Enconverting the Arabic texts to the universal format.
- Deconverting the universal materials produce by other language centers to Arabic.
The Achievements of the ISUAC
A) Arabic language resources and tools.
B) Developing tools.
C) Arabic language-based universal materials.
1- The Arabic Dictionary: It is a repository of information for all UNL Arabic grammars.
A) Arabic language resources and tools:
DictionaryUniversal words (Vocabulary of UNL)
Head Words (Vocabulary of Arabic)
Linguistics Features (Linguistic info about HWs)
2- Arabic EnConversion Rules:
• Arabic EnConversion Rules are able to:
1- Perform morphological analysis to extract concepts the Arabic words refer to.
2- Assign exact semantic relation between concepts as being expressed in the context of the Arabic sentence.
• It is responsible for Enconverting Arabic to UNL.
• Simulation of how Enconverter works
في الناصر عبد جمال باكوس 18في 1918يناير 15ولد حي قنوات شارع
.باإلسكندرية/ / / / في / / الناصر عبد جمال / /15ولد / /1918يناير/ /18في/ / / / / / / / / / إسكندرية/ / بال باكوس حي قنوات شارع /.
delete
delete
January
Kanawat(iof>street)Alexandria(iof>city)
15
plc
tim
obj
191818
territory(icl>region)
tim
modmod
mod
born(obj>thing).@past
plc
mod
plc
street(icl>road)Gamal
Abdelnaser(iof>person)Bakous(iof>place)
January
Bakous(icl>place)
1918
Gamal Abdelnaser(iof>person)
@topic
15
territory(icl>region)
Kanawat(iof>street)18
Alexandria(iof>city)
objtim
mod
tim
born(obj>thing)@past
plc
street(icl>road)
modmod
plcplc
mod
UNL Network:
3- Arabic DeConversion Rules:
• It is responsible for generating Arabic sentences out of UNL networks.
• Arabic DeConversion Rules are able to:
1- Select Arabic words that represent universal concepts.
2- Arrange the concepts of the UNL network in a syntactically well-formed sentence.
• Simulation of how the Deconverter works
outcome(icl>resul).@entry
description(icl>action)Egypt
collaboration(icl>action)
scientist(icl>scholar) .@entrry
scholar(icl>person)
More (aoj>thing)
prominent(aoj>thing)
بونابرت
صاحب
مرموق
عالم
باحث
150 مص1798أكثرر
تعاون
محصل
وصف
accompany(agt>thing,obj>thing)
150
1798
Bonaparte(iof>person)
Egypt
الذيومنةن
وا
obj
aoj
modagt
and
obj
bas
tim
gol
agt
aoj
aoj
مصر
من أكثر تعاون محصلة مصر بونابرت 150وصف صاحبوا الذين مرموق عالم و باحثمصر 1789في إلى
إلفيى
4- A Corpus for Modern Standard Arabic:
• A representative sample (100 Millions) that reflects the empirical usage of Modern Standard Arabic.
• It plays a principle role in enhancing and updating both EnConversion and DeConversion rules.
B) Developing tools:1- Integrated Development Environment (IDE)
2- Corpus analysis software (GATE)
C) Arabic language-based universal materials.
Library of Alexandria: the Fourth Pyramid.
Abou Simple: The Temple of the Sun.
Nasser Digital Library
The Encyclopaedia of Famous Persons
An example of an Arabic Sentence in An example of an Arabic Sentence in UNL (universal) formatUNL (universal) format
عام في ولد الذي حسين الناصر لعبد األكبر االبن الناصر عبد جمال وكانولكنه 1888 الفالحين، من أسرة في مصر صعيد في مر بني قرية في
البريد مصلحة في بوظيفة يلتحق بأن له سمح التعليم من قدر على حصلالحياة ضرورات لسداد بصعوبة يكفي مرتبه وكان {unl}.باإلسكندرية،
aoj(son(icl>person):0I.@def.@entry, Gamal Abdel Nasser(iof>person):00)mod(son(icl>person):0I.@def.@entry, Abd El-Naser Hosen(iof>person):23.@topic)aoj(old(aoj>thing):1J, son(icl>person):0I.@def)man(old(aoj>thing):1J, most(icl>how):15)obj(born(obj>thing):31.@past, Abd El-Naser Hossain(iof>person):23.@topic)and(get(agt>thing,obj>thing):6S.@past.@contrast, born(obj>thing):31.@past)scn(born(obj>thing):31.@past, family(icl>group):5Q)plc(born(obj>thing):31.@past, village(icl>region):4D)tim(born(obj>thing):31.@past, year(icl>period):3M)mod(year(icl>period):3M, 1888:41)plc(village(icl>region):4D, upper Egypt(iof>place):58)mod(village(icl>region):4D, Bani Morr(iof>village):4S)mod(family(icl>group):5Q, farmer(icl>person):65.@pl.@def)obj(get(agt>thing,obj>thing):6S.@past.@contrast, degree(icl>abstract thing):7N)agt(allow(agt>thing,gol>thing,obj>thing):8M.@past, degree(icl>abstract thing):7N)mod(degree(icl>abstract thing):7N, education(icl>activity):82.@def)gol(allow(agt>thing,gol>thing,obj>thing):8M.@past, join(agt>person,obj>thing):9I.@present)obj(allow(agt>thing,gol>thing,obj>thing):8M.@past, his(pos>he):97)and(suffice(aoj>thing,obj>thing):CM.@present, join(agt>person,obj>thing):9I.@present)obj(join(agt>person,obj>thing):9I.@present, job(icl>work):A7)plc(job(icl>work):A7, postal service{icl>service ):AN)plc(postal service{icl>service ):AN, Alexandria(iof>city):BB)aoj(suffice(aoj>thing,obj>thing):CM.@present, salary(icl>money):BV)mod(salary(icl>money):BV, his(pos>he):CB)obj(suffice(aoj>thing,obj>thing):CM.@present, satisfy(agt>thing,obj>thing):DQ)man(suffice(aoj>thing,obj>thing):CM.@present, hardly:DA)obj(satisfy(agt>thing,obj>thing):DQ, demand(icl>wants):E6.@pl.@def) mod(demand(icl>wants):E6.@pl.@def, life(icl>activity):EV.@def){/unl}
Language -Independent Format
Is it going to work this way?!!Is it going to work this way?!!
• Are there language servers ready to work?
• Is the Arabic language server able to enconvert Arabic texts to universal format?
• Is it also able to deconvert the universal materials back to Arabic?
What about Arabic??
• Are the universal materials deconvertable to other languages?
A proof of the conceptA proof of the concept
UNL-based Library Information UNL-based Library Information System (UNL-LIS)System (UNL-LIS)
• It is a system to search in a digital library catalogs.
• It is built on the UNL KI, therefore:uilt on the UNL KI, therefore:
- Query is in Natural Language (two languages)- Query is in Natural Language (two languages)
-Answer is also in Natural Language (7 languages)Answer is also in Natural Language (7 languages)
Question in NL
Answer in UNL
Question in UNL
UNL LIS Core Architecture UNL LIS Core Architecture
LIS
MARC21Records
MARC21ImportingProcess
UNL KB
Encyclopedia
ConceptsDefinitions
UserQuestion
LanguageServer
Enco rules+
DicEnconversionProcess
QueryEngine
DeconversionProcess
Answer in NL
LanguageServer
Deco rules+
Dic
Demo: Screen Shots
1. Enter query
2. Press to search Encyclopedia
4. View results here (Naguib Mahfouz). Click for more information.
3. Specify result's language (Arabic)
5. A link to the UNL document
{unl} agt(begin(agt>thing,obj>action):12.@past.@entry, Naguib Mahfouz(iof>person):0N.@topic) obj(begin(agt>thing,obj>action):12.@past.@entry, writing(icl>action):18) tim(begin(agt>thing,obj>action):12.@past.@entry, year old:1S.@past) aoj(year old:1S.@past, Naguib Mahfouz(iof>person):0N.@topic) qua(year old:1S.@past, 17) plc(born(aoj>thing):00, Cairo(iof>city):08) aoj(born(aoj>thing):00, Naguib Mahfouz(iof>person):0N.@topic) tim(born(aoj>thing):00, 1911:0H) {/unl} [/S] ;;Time 1.4 Sec ;;Done!
{unl} and(write(agt>thing,obj>thing):1K.@past.@entry, publish(agt>thing,obj>thing):0K.@past) obj(write(agt>thing,obj>thing):1K.@past.@entry, novel(icl>tale):1B.@pl.@topic) tim(write(agt>thing,obj>thing):1K.@past.@entry, before(icl>how(obj>thing)):1S) aoj(more(icl>additional):1A, novel(icl>tale):1B.@pl.@topic) qua(novel(icl>tale):1B.@pl.@topic, 10:16) [/S]
Conclusion
ConclusionConclusion
• Independency of language is a very important dimension that should be considered in storing and retrieving texts for a UDL
• The UNL system is a promising formalism for representing knowledge in a universal format.
• The ISAUC less than 2 years old, however, it is one of the very active language centres in designing and implementing UNL materials and tools.
• The UNL LIS has proved feasibility of the concept of language independency.
Thank YouThank You
Any question is Any question is welcomed.welcomed.