Upload
kaiden-birkes
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Multiword Expressions Facilitate, not Hinder, Understanding
9 November 2006Jerry Ball
Senior Research Psychologist
Human Effectiveness Directorate
Air Force Research Laboratory
2
Multiword Expressions: “A Pain in the Neck”?
• According to Sag et al. (2002) Multiword Expressions (MWEs) are a “pain in the neck” for developing Natural Language Processing (NLP) systems
• MWEs must be handled as exceptions to a word-based compositional semantics
– Meaning of MWEs cannot be determined from meanings of individual words composed together according to syntax
• Unfortunately, MWEs are ubiquitous in natural language
• Sag, I., Baldwin, T, Bond, F, Copestake, A. and Flickinger, D. (2002). Multiword Expressions: A Pain in the Neck for NLP. In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics
3
Multiword Expressions: “A Pain in the Neck”?
• Maybe the current word-based compositional semantic approach to building NLP systems is missing something!
– Words are the base meaningful units
– Words are the base units of recognition
– Meaning of expression is composed from meanings of words recognized independently and combined syntactically
• But humans recognize and understand linguistic units holistically at multiple levels, not just words
– Letter, Phoneme, Syllable, Morpheme, Word, Phrase, Text
4
Identifying Letters in Words
Count the number of F's in the following text:
FINISHED FILES ARE THE
RESULT OF YEARS OF SCIENTIFIC
STUDY COMBINED WITH THE
EXPERIENCE OF YEARS
5
Identifying Letters in Words
Count the number of F's in the following text:
FINISHED FILES ARE THE
RESULT OF YEARS OF SCIENTIFIC
STUDY COMBINED WITH THE
EXPERIENCE OF YEARS
6
Identifying Letters in Words
Count the number of F's in the following text:
FINISHED FILES ARE THE
RESULT OF YEARS OF SCIENTIFIC
STUDY COMBINED WITH THE
EXPERIENCE OF YEARS
7
Composing Words from Letters
• The word “of” is recognized holistically
• “of” is not recognized by recognizing “o” and recognizing “f” and combining them to get “of”
• Words can be recognized without recognizing the individual letters
• Even when the task is to identify letters, this can be difficult for very common words
– The “f” in “of” is perceptually implicit
Healy, A. F. (1976). Detection errors on the word The: Evidence for
reading units larger than letters. Journal of Experimental Psychology:
Human Perception & Performance, 2, 235-242.
8
Identifying Words
rscheearch ltteer waht lteter oredr wrod pclae deosn't olny tihs taht frist uinervtisy lsat rghit rset toatl mttaer mses iprmoetnt raed aoccdrnig wouthit porbelm cmabrigde ltteers bcuseae huamn deos raed sitll mnid ervey istlef tihng wrod wlohe
9
Identifying Words
rscheearch ltteer waht lteter oredr wrod pclae deosn't olny tihs taht frist uinervtisy lsat rghit rset toatl mttaer mses iprmoetnt raed aoccdrnig wouthit porbelm cmabrigde ltteers bcuseae huamn deos raed sitll mnid ervey istlef tihng wrod wlohe
10
Identifying Words
rscheearch ltteer waht lteter oredr wrod pclae deosn't olny tihs taht frist uinervtisy lsat rghit rset toatl mttaer mses iprmoetnt raed aoccdrnig wouthit porbelm cmabrigde ltteers bcuseae huamn deos raed sitll mnid ervey istlef tihng wrod wlohe
11
Identifying Words
rscheearch ltteer waht lteter oredr wrod pclae deosn't olny tihs taht frist uinervtisy lsat rghit rset toatl mttaer mses iprmoetnt raed aoccdrnig wouthit porbelm cmabrigde ltteers bcuseae huamn deos raed sitll mnid ervey istlef tihng wrod wlohe
12
Identifying Words in Context
Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy it deosn't mttaer in waht oredr the ltteers in a word are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.
Rawlinson, G. E. (1976) The significance of letter position in word recognition. Unpublished PhD Thesis, Psychology Department, University of Nottingham, Nottingham UK.
http://www.mrc-cbu.cam.ac.uk/~mattd/Cmabrigde/
13
Words in Context are Easier to Recognize
• It is easier to recognize words whose letters are jumbled within an expression than to recognize isolated words with jumbled letters
– toatl
– a toatl mses
• More noise in the input can be tolerated when recognizing larger units
• If the linguistic unit can’t be recognized, the meaning cannot be determined!
– Larger units facilitate recognition larger units faciliate understanding
14
Words in Context are Easier to Recognize
• It is easier to recognize words whose letters are jumbled within an expression than to recognize isolated words with jumbled letters
– toatl
– a toatl mses
• More noise in the input can be tolerated when recognizing larger units
• If the linguistic unit can’t be recognized, the meaning cannot be determined
– Larger units facilitate recognition larger units faciliate understanding
15
What’s Wrong With Compositional Semantics!
• Meaning of expression is composed from meaning of words recognized independently
– Meaning of “black cat” equals meaning of “black” + meaning of “cat”
• MWEs must be treated as exceptions
– Meaning of “black ice” does not equal meaning of “black” + meaning of “ice”
– “black ice” is actually clear, not black!
• Why not recognize the largest units of meaning and simplify the problem!
– Don’t treat MWEs as exceptions
16
High Frequency Words
• The meaning of high frequency words like “take” and “have” cannot be determined in isolation from the expressions in which they occur
– Take “take” for instance
– Take a hike
– Take five
– Take place
– Have a blast
– Don’t have a cow
– Have at it
17
High Frequency Words
• Why are high frequency words the most ambiguous?
– It isn’t possible to have a separate word for every concept that may need to be expressed
– Some words must be used in the expression of multiple concepts
– The words used in the expression of multiple concepts are
necessarily ambiguous and tend to be high frequency
18
Syllables, Morphemes or Words?
• Irrelevant, but possible words or morphemes within words are better recognized as meaningless syllables
– It does not make sense to try to compose the meaning of “carpet” from the meanings of “car” and “pet”!
– How do we avoid recognizing “car” and “pet” as meaningful?
• Words in MWEs often function more like meaningless syllables than independent meaningful units!
– The meanings of “ad” and “hoc” in “ad hoc”
• Although “ad” and “hoc” have meanings in Latin
– The meaning of “blue” in “blue moon”
• Even if the meaning of “blue” is initially activated by “blue”, it is not part of the meaning of “blue moon”
19
Syllables, Morphemes, Words or Expressions?
• No sharp divide between syllables, morphemes, words and expressions
– “nonetheless” vs. “none the less”
• Is “none” a syllable or morpheme in “nonetheless” or a word in “none the less”?
– “whatever” vs. “what ever”
– “alot” vs. “a lot”
– “whatchamacallit” vs. “what do you call it”
20
What are Acronyms?
• Acronyms are MWEs that are perceptually re-encoded as a sequence of letters (written) or syllables corresponding to letters (spoken)!
– “AFMC” vs. “Air Force Materiel Command”
• Acronyms allow a single perceptual unit to encode an entire MWE!
– Overcome limitations of visual and aural perceptual span
21
Frequency of Multiword Expressions
• Conventionalized Expressions
– We say “baked potato” and “roast beef” not “baked beef” or “roast potato” (although “roast potatoes” is OK)
– 25% of expressions are conventionalized ways of saying things!
– Erman, B. & Warren, B. (1999). The idiom principle and the open choice principle. Text Vol 20 pp. 29-62
• Formulaic Language
– As much as 70% of our adult native language may be formulaic! (Altenberg, 1990)
– Wray, A. & Perkins, M. (2000). The functions of formulaic language: an integrated model. Language & Communication 20, pp. 1-29
– Altenberg, B (1990). Speech as linear composition. Proceedings of the Fourth Nordic Conference for English Studies
22
Frequency of Multiword Expressions
• The number of MWEs in a speaker’s lexicon is of the same order of magnitude as the number of single words
– Jackendoff, J. (1997). The Architecture of the Language Faculty. Cambridge, MA: The MIT Press.
• In WordNet, 41% of entries are multiword
• The number of MWEs increases in specialized domains and acronymns are ubiquitous!
– “AFRL” vs. “Air Force Research Laboratory”
– “BRAA” vs. “Bearing, Range, Altitude & Aspect”
• Sag, I., Baldwin, T, Bond, F, Copestake, A. and Flickinger, D. (2002). Multiword Expressions: A Pain in the Neck for NLP. In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics
23
Processing Efficiency
• There really isn’t time to process spoken input one word at a time
– Word-based compositionality is computationally too expensive
• Even if each word in a 20 word sentence has only 3 meanings (on average), there are 203 possible combinations!
• Extensive search is not a cognitively viable option
• There must be constraints that minimize the number of alternatives
• MWEs offer one such constraint
• MWEs are directly retrievable from memory reducing the amount of processing required to determine meaning
24
Processing Efficiency
• Humans can recognize letters in words more rapidly than letters in isolation
– Word Superiority Effect
• Can humans recognize words in MWEs more rapidly than recognizing words in isolation?
– Multiword Superiority Effect?
– Suggested by our ability to complete unfinished MWEs without seeing or hearing the entire final word
• “kicked the bu…”
• “spill the b… ”
– Suggested by the Cambridge Study example
25
Processing Efficiency
• Perceptual processing is constrained by the visual perceptual span in reading and the size of the phonological buffer in speech
– Mechanisms that shorten the visual and aural span should facilitate processing
– Mechanisms that link perceptual units to larger units of meaning should facilitate processing
26
Processing Efficiency
• Acronyms and abbreviations support efficient processing
– “HE” vs. “Human Effectiveness Directorate”
– “AFRL/HE” vs. “Air Force Research Laboratory…”
• They achieve this by associating a perceptual unit with a larger unit of meaning
– “HE” is perceived as a unit
– “HE” is stored as a unit and linked to “Human Effectiveness Directorate” which is also stored as a unit
• Sometimes the original expression is lost or modified
– “AOC” vs. “Air and Space Operations Center”
– “RADAR” vs. ??? (Radio Detection and Ranging)
27
Processing Efficiency
• Recognition of larger units competes with recognition of smaller units
– If larger unit is recognized first, smaller unit remains implicit unless task requires accessing smaller unit
• Recognition of smaller units of meaning is detrimental to understanding in many cases!
– Irrelevant meanings
• “car” in “carpet”
• “a” in “a priori”
– Literal interpretation of non-literal language
• “Have a nice day!”
• “I wasn’t going to, but if you say so!”
28
Processing Efficiency
• MWE Storage
– Humans have a powerful associative memory
– Storage of frequently occurring MWEs is psychologically plausible
• MWE Perception
– MWEs may be holistically perceivable
– Perhaps in single fixation when reading
• Advantage of acronyms and abbreviations in English
• In written Hebrew, only consonants are written which should facilitate recognition of MWEs
– Via some concatenation mechanism in speech
29
Why MWEs are good!
• The larger the linguistic unit, the less likely to be ambiguous
• The larger the linguistic unit, the less susceptible to noise
• The larger the linguistic unit, the more rapidly it can be recognized relative to individually recognizing the lower level elements of the unit
• Bigger is better!
30
Summary
• Humans have little difficulty understanding MWEs
• NLP systems should be designed to handle MWEs as part and parcel of what they do, not treat them as exceptions that are a “pain in the neck”!
• The result will be better NLP systems!
31
Questions?
32
Perceiving Larger Linguistic Units
• Phonologic Loop
– 2 seconds of spoken input
– Baddeley, A. (???)
• Visual fixations
– 4 letters to left of fixation
– 9 letters to right of fixation
– Carpenter, & Just (???).
33
Storing Larger Linguistic Units
• Long-Term Memory
– About 4 Distinct Units in a Single Declarative Memory Chunk
– Hierarchically organized
– No limit to depth of hierarchy
• Short-Term Working Memory
– Phonologic Loop
– Visuo-Spatial Sketch Pad
34
Change in Meaning Change in Form
• Changes in meaning often result in changes in form
• Grammaticization Processes
– “going to” “gonna”
– “want to” “wanna”
– Bybee (2001) explains the processes of reduction and drift by which frequently co-occurring words come to have unique phonology (i.e. perceptual form) and meaning
– Bybee, J. (2001). Phonology and Language Use. Cambridge, UK: Cambridge University Press
• Specialized Uses Lead to Specialized Pronunciation
– “Whatever” used as a negative response
– “Bad” used to mean “Good”