View
216
Download
0
Category
Tags:
Preview:
Citation preview
Semitic Linguistic Phenomena and Variations
Nizar Habash
University of MarylandInstitute for Advanced Computer Studies
MT Summit IX Workshop
Machine Translation for Semitic Languages
Introduction
• What this talk is about– Similarities that define “the Semitic family”
– Variations differentiating members within the family
– Similarities do not go beyond morphology and syntax
– Relevance to NLP and MT
• Most researchers focus on one Semitic language– Modern Standard Arabic (henceforth, A)
– Modern Hebrew (henceforth, H)
– Arabic Dialect: Palestinian Arabic (henceforth, P)
Road Map
• Introduction• Orthography
– Phonology– Scripts– Spelling– Ambiguity
• Morphology• Syntax• Translation Divergences• Conclusion
Orthography: Phonology
Consonant Vowels Script
A 28 6 3 short
3 longعربي36 graphemes
H 18 5 עברית22 graphemes
P 28 10 5 short
5 longعربي?36 graphemes
Orthography: Script• Alphabets
– Graphemic Variants
– ككك (out of 22 5) כ ך ,(out of 36 27) ك– Encoding issues
• Optional diacritics– Some Vowels Aש Cש س س� – Lack of vowel Dس�ש
– Consonantal Doubling ש س
Orthography: Spelling
• Mostly consonantal Spelling– ʃlvm = ʃalom = שלום ,slam = salām = سالم – Dual use of (a w/v j א ו י اوي ) as consonant and
vowel
• Diacritics as semantic markers– ר Mזכ (zaxar male) ר Cזכ (zaxar to remember)
– تب (kataba to write) كتب (kutiba to be written) ك
Orthography: Spelling• Hebrew
– Full Spelling, “Defective” Spelling ( כתיב חסר,כתיב מלא )– kotel כתל תלוכ (wall)
• Arabic – Morphophonemic Spelling
– Feminine Marker ة (ta marbuta)• ةكبير (♂ kabīr big) كبير (kabīra big ♀)
– Derivation Marker• hawa (to love ىهو ) (air اهو )
– Hamza Variants (6 characters for one phoneme)• ( أآإؤئ بها ؤبها ءبها (ء هئه
Orthography: Ambiguity
ā ʔt bʤθx ħđ dz rss ʃt dʕ d -Z
k ʁq flm
ثت يوهنملكقفغعظطضصشسزرذدخحجبا ى ةئؤإآأء
h nwj ūī
av bd gu hz ot xij el kn mts sp frʃ
תכיטחזוהדגבא ש ר ק צ פעסנמל
A
H
Orthography: Ambiguity
ā ʔt bʤθx ħđ dz rss ʃt dʕ d -k ʁq flm
ثت يوهنملكقفغعظطضصشسزرذدخحجبا ى ةئؤإآأء
h nwj ūī
av bd gu hz ot xij el kn mts sp frʃ
תכיטחזוהדגבא ש ר ק צ פעסנמל
P
H
ōē z
Road Map
• Introduction• Orthography• Morphology
– Derivational– Inflectional
• Noun Inflections• Verb Inflections
• Syntax• Translation Divergences• Conclusion
Morphology: Derivational• Roots and Patterns
Meaning = (Root.Meaning+Pattern.Meaning)*Idiosyncrasy.Random
مكتوب
ب K T B
و? ??م�
تك
כתוב
ב
ו? ??
תכ
Morphology: Root Meaning
• KTB: writing “stuff”
כתב
מכתב
כתב
כתיבspelling
כתובתaddress
كتب
كاتب
مكتوب
كتابbook
مكتبةlibrary
مكتبoffice
write
writer
letter
• LHM-2 (battle sense)– ملحمة
• Fierce battle, massacre, epic
– מלחמה לוחמה לחם לוחם לחימה
• War, battle, quarrel, conflict, combat, warfare, belligerence, fighting, quarreling, fighter, militarism, militancy, bellicosity
Morphology: Root Meaning
• LHM-3 (Solder sense)– ملتحم التحم تالحم لحم
لحمة• Weld, solder, get stuck,
cling together, merged, fused, kinship
– לחם הלחים מולחם מלחם
• Solder, soldered, soldering iron,
Morphology: Root Meaning
• LHM-4 (Conjuctiva sense)– لحمية
• conjunctiva
– לחמית• conjunctiva
Morphology: Root Meaning
Morphology: Noun Inflections
conj
prep
noun
posspluralarticle
وكبيوتناو +ك + بيوت + نا
And-like-houses-ourAnd like our houses
שבביתבית+ה+ב+ש
That-in-the-houseWhich is in the house
•Arabic Broken Plurals•Hebrew Ambiguous definiteness
conj
verb
IOBJobject
neg
subj
Morphology: Verb Inflections
tense
A: وسنكتبهاAnd-will-we-write-itAnd we will write it
H: ואהבתיהAnd-loved-I-herAnd I loved her
P: لوشحتستعمليماوAnd-not-will-use-you-for-it-notAnd you will not use for it
Morphology: Verb Inflections• Perfect Verb Derivation (Suffixes only)
1st Person Singular 2nd Person Singular ♂ 2nd Person Singular ♀
A تكتب katabtu ت�كتب katabta تBكتب katabti
H תיכתב katavti תכתב katavta תכתב katavt
P تكتب katabt تيكتب katabti
• Imperfect Verb Derivation (Prefix+Suffix)1st Person Singular 2nd Person Singular ♂ 2nd Person Singular ♀
A كتبا aktubu taktubu تكتب ين�كتبت taktubīna
H כתובא extov כתובת textov יכתבת textevi
P كتبا aktob كتبت toktob يكتبت toktobi
Perfect Imperfect Participle
Hכתבkatav
Past
יכתובjixtov
Future
כותבkotev
Present
Aكتبkataba
Past
يكتبjaktubu
Present
يكتبسsajaktubu
Future
كاتبkātib
0-Tense
Pكتبkatab
Past
يكتبjiktob
0-Tense
يكتبحħajiktob
Future
يكتبب bjoktob
Present
كاتبkāteb
0-Tense
Morphology: Semantics of Verb Inflections
Road Map
• Introduction• Orthography• Morphology• Syntax
– Sentence Structure
– Noun Phrase Structure
• Translation Divergences• Conclusion
Sentence Structure
• Sentence structure– Copular sentences– Verbal sentences
• Copular sentences– Topic Complement– Definite Indefinite– كبير ال כלב גדולה كلب– The-dog big
*كلب كبير
topic comp
dog big
Sentence Structure
• Verbal sentences– The children wrote the poems
– A: Verb Subject Object• االشعار كتب االوالد• Wrote the-children the-poems
– H, P: Subject Verb Object• את השיריםכתבוהילדים
• The-children wrote obj the-poems
• االشعار كتبواالوالد • The-children wrote the-poems
Noun Phrase
• Noun Adjective • Noun-Adjective Agreement
– number, gender, definiteness
a big dog ♂ a big dog♀ big dogs♂ the big dog ♂
كبير كلب كبيرة كلبة كبار كالب كبيرالكلب ال
כלב גדול כלבה גדולה כלבים גדולים גדולהכלב ה
dog♂ big♂ dog♀ big♀ dogs♂ big♂+pl the-dog♂ the-big♂
Noun Phrase
• סמיכות / اضافة (idafa/smixut)• Noun1 of Noun2 encoded structurally
– Noun1-indefinite Noun2-definite– االردن מלך ירדןملك– king Jordan = the king of Jordan / Jordan’s king
• Noun1 Form Change– Feminine (H and P)
• מלכה + ירדן ירדןמלכת Queen of Jordan – Plural (A and H)
• מלכים + ירדן מלכי ירדן Kings of Jordan • Alternatives (only H and P)
– Noun1 <particle> Noun2 – االردن تبع the-king belonging-to Jordan الملك– the-king that-for Jordan המלך של ירדן
Translation Divergences
• Variations beyond syntax• How languages map semantics to syntax• As complex and diverse as any other language• Divergence Dimensions
– Categorial Variation (develop development)– Conflation (become frozen freeze)– Inflation (freeze become frozen)– Structural (enter the room enter into the room)– Head Swap (swim across cross swimming)– Thematic (John likes Mary Mary pleases John)
*عند كلب
יש
לכלב
كلب عند يat-me dog
have
I dog
I have a dog י כלבל ישthere for-me dog
אניانا
Translation Divergencesconflation
هنا تلسI-am-not here
be
I here
I am not here
not
ليس
نا ا هنا
Translation Divergencesconflation
*אנילאפה
אני פהלאI not here
*نا ا بردان
*קר ל
بردانانا I cold
be
I cold
I am cold קר ליcold for-me
אני
Translation Divergencesthematic
عثر
انا على
מצא
אני את
الرجل علىعثرت found-I upon the-man
find
I man
I found the man האישאתמצאתי found-I obj the-man
رجلאיש
Translation Divergencesstructural
عثر
انا على
الرجل علىعثرت found-I upon the-man
find
I man
I found the man الرجال لقيتfound-I the-man
رجل
لقى
انا رجال
Translation Divergences structural
swim
I quicklyacross
river
I swam across the river quickly
Translation Divergenceshead swap and categorial
اسرع
انا عبورسباحة
نهر
سباحة النهر عبور اسرعتI-sped crossing the-river swimming
swim
I quicklyacross
river
I swam across the river quickly
Translation Divergences head swap and categorial
חצה
אני אתב
נהר
ב
שחיה מהירות
חציתי את הנהר בשחיה במהירותI-crossed obj river in-swim speedily
Translation Divergences head swap and categorial
חצה
אני אתב
נהר
ב
שחיה מהירות
اسرع
انا عبورسباحة
نهر
swim
I quicklyacross
river
noun
prep
verb
noun
adverb
verb
nounverb
noun
Conclusion
• Many defining features of the Semitic family– Orthographic conventions, morphological derivation and
inflection, phrase structure, etc
• Many variations that create different kinds of ambiguities and problems– Phonology of orthography, Semantics of derivation and
inflection
• Do similarities extend beyond morphology and syntax?– Translation divergences within Semitic family– Ambiguity preservation between Semitic languages
Recommended