1
(and computational linguistics)
Computational linguistics: computer science theory cognition algorithms
Natural language processing: software development application practical techniques
Computer methods and their usefulness (or uselessness) for human language processing (textual, spoken, gestural, etc.)
Implementation of techniques, procedures, algorithms for language computation
Enabling human-machine communication Enhancing human-human communication
3
4
computer science
psychology/cognitive science
linguistics
math/statistics
philosophy
communication
NLP
Tokenization Part-of-speech tagging Computational morphology Syntactic parsing Lexical relations Dialogue move engines
Dialectizer Speech recognition (speech to text) Speech synthesis (text to speech) Diacritization, Romanization Corpus annotation (Syriac) Thought identification
Question answering Summarization Natural language generation Machine translation Spoken language identification Spoken language translation
Humanities, natural and behavioral sciences, and engineering
Linguistics, computer science, psychology, and mathematics
Theory and practice, science and art Models, foundations vs. corpora, data
(top-down vs. bottom-up)
8
Math: statistics, calculus, algebra, modelling Computational paradigms: connectionist, rule-
based, cognitively plausible Linguistics: LFG, HPSG, GB, OT, CG, etc. Architectures: stacks, automata, networks,
compilers
9
Several approaches implemented, taught here Homegrown: analogical modeling (AM) State-of-the-art performance in various
applications for various languages: Written language identification Part-of-speech tagging Morpheme boundary detection Named entity recognition Word sense disambiguation Shallow parsing Semantic role labeling Spoken language identification
10
11
Year Price
Make Mileage
Model
Feature
PhoneNr
Extension
Car
has has
has
has is for
has
has
has
1..*
0..1
1..*
1..* 1..*
1..*
1..*
1..*
0..1 0..1 0..1
0..1
0..1
0..1
0..*
1..*
Work on information extraction (data-rich text, web)
Recognition and extraction of low-level data elements
Ontology-based Related applications: ontology
generation, text similarity and classification, information integration, etc.
NSF-funded
12
Results and issues
• Corpus of 1500 obituaries, 500 hand-annotated
• Preliminary evaluation on a few features: name, age, title, birth date, death date, death place, funeral time/location
• Results: around 80% precision, little less on recall
• Lexicon coverage (especially place names)
• Occasional typos • Deceased sometimes
not named • Factored lists: Pierre et
Marie, son fils et belle-fille
• Anaphora resolution: Né à Paris et y décédé…
… …
… …
… …
… …
… …
grandchildren of Mary Ely
… …
… …
grandchildren of Mary Ely
… …
grandchildren of Mary Ely
… …
… …
… …
… …
Number of facts extracted: 22,251 8,740 Person-BirthDate facts 3,803 Person-DeathDate facts 9,708 children facts, including
▪ 5,020 Child-has-parent-Person facts ▪ 2,394 Son-of-Person facts ▪ 2,294 Daughter-of-Person facts
Number of implied grandchild facts inferred: 5,277
Processing time: ~18 seconds per page CPU time: ~4 hours
Precision: .52 (spot-checking 100 of the 22,251 facts) Recall: .33 & Precision: .40 (spot-checking 2 fact-filled family
pages)
“Find a BBQ restaurant near the Umeda station, with typical prices under $40”
Language-Agnostic Ontology
Oral proficiency testing for language learners
Sentences presented aurally, repeated back Carefully engineered for vocabulary level,
grammatical complexity, length in syllables Score responses with forced alignment Correlate to standard testing methods English, French, Spanish, Japanese In use at language training facilities,
universities, industry
Too short: just WM task w/ parroting Too long: impossible to repeat Too complex: even NS can’t repeat Too simple: can’t discriminate NNS levels EI item design is a linguistic engineering
task! Sentence length Sentence complexity Vocabulary levels Breadth of sampling of grammatical
structures, constructions
681,925 annotated sentences of length 5-20 words
NLP in a cognitive modeling framework Goal-directed, incremental Machine learning Trying to model/mimic human performance
in language tasks Several modalities Parsing Generation Translation Dialogue
30
Cognitive modeling Model human behavior: agent-based, goal-
directed, representation of world, decomposable actions, learned skills, behaviors, expertise, memory
Fatigue, emotion, attention, overload, confusion Plausible: processes, time course, constraints Robots: explore control, agency, interaction Language: cognition, acquisition, modeling,
agency, incrementality, discourse/dialogue, process (parsing, lexical access, generation, translation, …)
Develop NLP capability in Soar Parsing, generation, discourse/dialogue,
translation, speech Fit models of human performance data Incremental, learning, agent-based WordNet, other resources for lexical info English, French, Japanese Use in HCI, modeling (reading, acquisition),
task interactions, emotion, attention, ambiguity resolution, parser breakdown, etc.
33
Dialogue
Comprehension
Generation
Dialogue
Generation
Comprehension
Operationalize language processing of all kinds (mostly for DoD) Machine translation, sentiment analysis,
dialect recognition, prevarication detection, etc. Beyond the current paradigms, language
resources (cf. trained on newswire) MT and CLIR (A), HCI English+Arabic (B), ST
English+Arabic (C), Arabic dialects (D) Activity E: language, agents, and robotics
Grounded language acquisition by robots Deep semantics, visual+tactile input,
experiential learning of objects, actions, and consequences
Acquires language via grounding, hypothesizing, automated reasoning
Human guides acquisition via situated, inter-active instruction
Robot demonstrates understanding via performance
Social band (105 to 107:days to months) Rational band (102 to 104:minutes to
hours) Cognitive band (10-1 to 101:100 ms to 10
secs) Biological band (10-4 to 10-2:100 μs to 10
ms)
Put <object> in <location> Includes moving to <object>, picking it up, moving to <location>,
opening <location> if necessary, depositing <object>, closing <location> if necessary
Fails if already another object in location (or can extend to put second object in work area?)
Cook <object> Clears the location where the object will be cooked. Turns on location to correct temperature (background knowledge in
semantic memory!) If need to preheat (oven), wait for it to preheat. Puts object in location. Waits. Tests temperature or other appropriate sensor (toothpick for
cake?). Removes object from oven/stove and places on workspace Turns off oven/stove
40
41
42