Upload
ilasundaram
View
222
Download
0
Embed Size (px)
Citation preview
8/2/2019 Dr. GV. Uma 3
1/123
Natural Language Processing
Applications
8/2/2019 Dr. GV. Uma 3
2/123
8/2/2019 Dr. GV. Uma 3
3/123
aims at :
making computers talk
ability of humans
8/2/2019 Dr. GV. Uma 3
4/123
Dialog system
Fiction
C3PO -- Star Wars-- space o yssey
8/2/2019 Dr. GV. Uma 3
5/123
Dialog system
reality
E-commerce: AINI
a c a er o n egra e
with 3D animated agent
Improve customer
Reduce customer reliance
8/2/2019 Dr. GV. Uma 3
6/123
Dialog system
reality
-
(http://www.autotutor.org/what/what.htm )
learn by holding a conversationnal in natural
lan ua e
Animated agent : synthesis speech,
intonation, facial ex ressions, and estures
demo (from 2002)
8/2/2019 Dr. GV. Uma 3
7/123
8/2/2019 Dr. GV. Uma 3
8/123
-
Ask complete questions
a oes vergen mean
How many states were in Europe in 2007? What is the occupation of Bill Clintons wife ?
What do scientist think about global warming?
8/2/2019 Dr. GV. Uma 3
9/123
8/2/2019 Dr. GV. Uma 3
10/123
What would HAL need to engage in this dialog?
Dave Bowman: Hello, HAL do you read me, HAL? HAL: Affirmative, Dave, I read you.
Dave Bowman: O en the od ba doors, HAL.
HAL: I'm sorry Dave, I'm afraid I can't do that.
Dave Bowman: What's the problem?
HAL: I think you know what the problem is just as well as I do. Dave Bowman: What are you talking about, HAL?
HAL: This mission is too important for me to allow you tojeopardize it.
' ' ,
HAL: I know you and Frank were planning to disconnect me, andI'm afraid that's something I cannot allow to happen.
Dave Bowman: Where the hell'd you get that idea, HAL?
HAL: Dave, although you took thorough precautions in the pod
against my hearing you, I could see your lips move.
8/2/2019 Dr. GV. Uma 3
11/123
Speech recognition / speech
synthesis
how words are pronounced in terms ofsequences of sounds
How each of these sounds is realized
acoustically Morphology : cant, Im, were, lips...
Producing and recognizing variations of
n v ua wor s The way words break down into component
8/2/2019 Dr. GV. Uma 3
12/123
Phonetics
speech
/i:/, /:/, /:/, /:/ and /u:/
'there' => /e/
'there on the table' => /ern tebl /
Exercices
8/2/2019 Dr. GV. Uma 3
13/123
Articulory phonetics : production
Acoustics phonetics: properties of soundwaves (frequency and harmonics)
McGurk effect
8/2/2019 Dr. GV. Uma 3
14/123
Describe the way sounds function to encode meaning
Phoneme : speech sound that helps us constructing
meaning
, , , .
/u/ : rubble rabble, rebel, Ribble, robble...
Phoneme can be realized in different forms depending on
context (allophones)
Speech synthesis uses allophones
Speackjet
8/2/2019 Dr. GV. Uma 3
15/123
Study the structure of words
walks, walking, walked walk
Lemma + part of speech = lexeme Walk, walking, walked walk
,
Flectional morphology : decomposes a word into a lemma and one
or more affixes giving informations abouts tense, gender, number
Derivational morphology: decomposes a word into a lemma and oneor more affixes giving informations about meaning and category
,
Exceptions and irregularities ? Women woman, pl
ArentAre not
8/2/2019 Dr. GV. Uma 3
16/123
Morphology
Methods Lemmatisation : process of grouping together the different
inflected forms of a word so they can be analysed as asingle item Need to determine the art of s eech of a word in a sentence
(requiring grammar knowledge)
Stemming: operates on a single word without knowledge of
cannot discriminate between words which have different meaningsdepending on part of speech
,
for some applications
Examples ,
walkinglemma: walk, matched in both stemming andlemmatization.
8/2/2019 Dr. GV. Uma 3
17/123
Morphology
Method and applications
Finite state transducer
to resolve anaphora:
Sarah met the women in the street..
women (pl) ]
for spell checking and for generation
* The women (pl) is (sg)
For information retrieval
...
8/2/2019 Dr. GV. Uma 3
18/123
Im sorry Dave, I cant do that
8/2/2019 Dr. GV. Uma 3
19/123
Syntax
structure of language
,
anguages ave structure:
not all sequences of words over the givena p a et are va
when a sequence of words is valid
gramma ca , a na ura s ruc ure can einduced on it.
8/2/2019 Dr. GV. Uma 3
20/123
expressions I am sorr Dave I cant do that
Grammars are used to describe the syntax
of a lan ua e Syntactic analysers and surface realisers
assi n a s ntactic structure to a
string/semantic representation on thebasis of a grammar
8/2/2019 Dr. GV. Uma 3
21/123
tree:
according to some formal grammar.
-
terminals of the grammar, while the leaf
nodes are labeled b terminals of the
grammar.
8/2/2019 Dr. GV. Uma 3
22/123
Syntax
tree example
S
NP VP
V NP PPJohn
Adv NPV Det Prepn
often gives a book to Mary
8/2/2019 Dr. GV. Uma 3
23/123
Words syntactic tree
Algorithm: parser A parser checks for correct syntax and builds a data.
Resources used: Lexicon + Grammar
-
Statistical : grammar acquired from treebank Treebank : text corpus in which each sentence has
been annotated with syntactic structure. Syntactic structure is commonly represented as a tree
Difficulty: coverage and ambiguity
8/2/2019 Dr. GV. Uma 3
24/123
Syntax
applications
For spell checking*
Its a fair exchange ok syntactic tree
To construct the meaning of a sentence
To generate a grammatical sentence
8/2/2019 Dr. GV. Uma 3
25/123
John loves Mary love(j,m)gen = u ec
Mary loves John love(m,l)gent = u ect
=Mary is loved by John love(j,m)
Agent = By-Object
8/2/2019 Dr. GV. Uma 3
26/123
Where the hell d you get that idea HAL
Dave, although you took thorough precautions
n e po aga ns my ear ng you, cou seeyour lips move
8/2/2019 Dr. GV. Uma 3
27/123
Lexical semantics
Meaning of wordsAn idea
1. come to have or hold;receive.
2. succeed in attaining,
. a oug or sugges on a ou a
possible course of action.2. a mental impression.
ac ev ng, or exper enc ng;obtain.
3. experience, suffer, or be
afflicted with.
. .
4. (the idea) the aim or purpose.
. move n or er o p c up,deal with, or bring.
5. bring or come into aspecified state or condition.
1. a place regarded in various religions
as a spiritual realm of evil and
suffering, often depicted as a place. ca c , appre en , or war .
7. come or go eventually orwith some difficulty.
8. move or come into a
of perpetual fire beneath the earth towhich the wicked are sent after
death.
spec e pos on or s a e
...
. .
3. a swear word that some people use
when they are annoyed or surprised
8/2/2019 Dr. GV. Uma 3
28/123
Who is the master?
Context?
Semantic relations?
Lewis Carroll, Through the looking glass
8/2/2019 Dr. GV. Uma 3
29/123
Where the hell did you get that idea?
a swear word that some people
use when they are annoyed or
surprised or to emphasize sth Have this belief
8/2/2019 Dr. GV. Uma 3
30/123
Definition and representation of meaning
Semantic relations Interaction between semantic and syntax
8/2/2019 Dr. GV. Uma 3
31/123
Semantic relations
Paradigmatic relation (substitution) Blablabla word1 bla bla bla
word2
How are youdoing? I would ask.Ask me how I amfeeling? he answered. , . . .I am veryhappyand verysad.How can you be both at the same time? I asked in all seriousness, a girl of
nine or ten.Because both require each others company. They live in the same house.Didnt you know?
Terry Tempest Williams, The village watchman (1994)
synonymy: sofa=couch=divan=davenport
antonymy: good/bad, life/death, come/go
contrast: sweet/sour/bitter/salt , solid/li uid/ as
hyponymy, or class inclusion: cat
8/2/2019 Dr. GV. Uma 3
32/123
Syntagmatic relations: relations between words that go
together in a syntactic structure.
Collocation : heavy rain, to have breakfast, to deeply regret...
Argumental structure
Someone breaks something with something
3 arguments
Difficulty: number of arguments ? Can an argument be optional ?John brokes the window
John brokes the window with a hammer
The window brokessemantic argument syntactic argument
Thematic roles : agent, patient, goal, experiencer, theme...
8/2/2019 Dr. GV. Uma 3
33/123
semantic / syntax
lexicon
Sub categorisation frames o run:
to eat : SN1, SN2
To give : SN1, SN2, SP3 to
envious : SN1, SP2 (of)
8/2/2019 Dr. GV. Uma 3
34/123
Semantic / syntax
lexicon
Logic representation: eat (x, y), give (x,y,z) Thematic roles : to give [agent, theme, go k], to buy [agent,theme, source , to love ex eriencer, atient
Link with syntax: break (Agent:, Instrument, Patient:) Agent subj
Instrument subj, with-pp Patient obj, subj
Selectional restrictions: semantics features on arguments
John eats bread lthme [+solide] [+comestible] *The banana eats filtering * John eats wine
8/2/2019 Dr. GV. Uma 3
35/123
Le robinetfuit/ Le voleurfuit-> leak/run away
For information retrieval (and cross Language
Information Retrieval) Search on word meaning rather than word form
Keywords disambiguation
more relevance
8/2/2019 Dr. GV. Uma 3
36/123
QA: Who assassinated President McKinley?
Person / Answer thematic role : Agent of target synonymous with
\assassinated
False positive (1): In [ne=date 1904], [ne=person descriptionPresident] [ne=person Theodore Roosevelt], who had succeeded the[target assassinated] [role=patient [ne=person William McKinley]], waselected to a term in his own right as he defeated [ne=person description
Democrat] [ne=person Alton B. Parker]? Correct Answer (8): [role=temporal In [ne=date 1901]], [role=patient
[ne=person description President] [ne=person William McKinley]] was[target shot] [role=agent by [ne=person description anarchist][ne=person Leon Czolgosz]] [role=location at the [ne=event Pan-
mer can xpos on n ne=us c y u_a o , ne=us s a e . .
Using Semantic representation in question answering, Sameer S and al, 2003
8/2/2019 Dr. GV. Uma 3
37/123
Dave Bowman: Open the pod bay doors, HAL.
HAL: I'm sorry Dave, I'm afraid I can't do that.
8/2/2019 Dr. GV. Uma 3
38/123
speakers intend by their use of sentences , .
STATEMENT: HAL, the pod bay door is open.
,
bay door open?
, ,greeting, apologizing...)
8/2/2019 Dr. GV. Uma 3
39/123
Where the hell'd ou et that idea HAL?
Dave and Frank were planning to disconnect me
Much of language interpretation is dependenton the preceding discourse/dialogue
8/2/2019 Dr. GV. Uma 3
40/123
Linguistics knowledge in NLP
summary
Phonetics and Phonology knowledge about linguisticsounds
orp o ogy now e ge o e mean ng u componen sof word
Syntax knowledge of the structural relationshipsbetween word
Semantics knowledge of meaning
to the goals and intentions of the speaker Discourse knowledge about linguistic units larger than
8/2/2019 Dr. GV. Uma 3
41/123
processing can be viewed as resolving
Ambiguous item multiple, alternative
linguistic structures can be built for it.
8/2/2019 Dr. GV. Uma 3
42/123
coo e water ow or er.
I cooked waterfowl belonging to her. I created the (plaster?) duck she owns.
body.
8/2/2019 Dr. GV. Uma 3
43/123
duck : verb / noun
Part of speech
tagging
Semantical ambiguity
Make: create / cookWord sense
disambiguation
Syntatic ambiguity:
Make: transitive/ ditransitive Syntactic
[her duck ] / [her][duck]disambiguation /
parsing
8/2/2019 Dr. GV. Uma 3
44/123
- -
Recognise speech. .
Speech act interpretation
Can you switch on the computer?
Question or request?
8/2/2019 Dr. GV. Uma 3
45/123
Ambi uit : the same sentence can mean
different things
Paraphrase: There are many ways of saying the.
Beer, please.
Can I have a beer? ve me a eer, p ease. I would like beer. Id like a beer, lease.
In generation (MeaningText), this implies makingchoices
Combinatorial problem
8/2/2019 Dr. GV. Uma 3
46/123
8/2/2019 Dr. GV. Uma 3
47/123
captured through the use of a small
o e s an eor es are a rawn or e
standard toolkit of computer science,
ma ema cs an ngu s cs
8/2/2019 Dr. GV. Uma 3
48/123
State machines Rule systems
Dynamic programming Machine learning
Logic
Probalistic models
Classifiers / sequence
models
- Vector-space models maximization (EM)
Learning algorithms
8/2/2019 Dr. GV. Uma 3
49/123
State, transition among state, input
Finite-state automata
Non deterministic
Finite-state transducers
8/2/2019 Dr. GV. Uma 3
50/123
egu ar grammars
Context-free grammars
Feature augmented grammars
8/2/2019 Dr. GV. Uma 3
51/123
the main tools used when dealing with, ,
syntax.
8/2/2019 Dr. GV. Uma 3
52/123
First Order Logic / predicate calculus Lamda-calculus, feature structures, semantic
primitives
These logical representations have traditionallybeen used for modeling semantics and
pragmatics, although more recent work has
techniques drawn from non-logical lexical
semantics.
8/2/2019 Dr. GV. Uma 3
53/123
Probabilistic models crucial for capturing every kind of linguistic knowledge.
Each of the other models can be augmented with probabilities. Exam le the state machine au mented with robabilities can
become
weighted automaton, orMarkov model.
hidden Markov models (HMMs) : part-of-speech tagging, speechrecogn on, a ogue un ers an ng, ex - o-speec , mac netranslation....
Key advantage of probabilistic models : ability to solve
almost any speech and language processing problem can berecast as given Nchoices for some ambiguous input, choose
.
8/2/2019 Dr. GV. Uma 3
54/123
based on linear algebra -
Word meanings
8/2/2019 Dr. GV. Uma 3
55/123
states representing hypotheses about an input Speech recognition : search through a space of
phone sequences for the correct word.
Parsing : search through a space of trees for the
.
Machine translation : search through a space of
translation hypotheses for the correct translation of a
sentence into another language.
8/2/2019 Dr. GV. Uma 3
56/123
Machine learning models: classifiers, sequence models Based on attributes describing each object
Classifier : attempts to assign a single object to a single class Se uence model: attem ts to ointl classif a se uence of
objects into a sequence of classes.
Example, deciding whether a word is spelled correctly : classifiers : decision trees, support vector machines, Gaussian
mixture models + logistic regression make a binary decision(correct or incorrect) for one word at a time.
,
Markov models + conditional random fields assigncorrect/incorrect labels to all the words in a sentence at once.
8/2/2019 Dr. GV. Uma 3
57/123
8/2/2019 Dr. GV. Uma 3
58/123
-
1950- 1970 : symbolic / statistical : our para gms
1983 1993 : empiricism and finite statemodels
1994 1999: field unification
2000 -2008 : empiricist trends
8/2/2019 Dr. GV. Uma 3
59/123
Probabilistic / information theoretic
1940s 1950s
8/2/2019 Dr. GV. Uma 3
60/123
1940 s 1950 s
Automaton Turings (1936) : model of algorithmic computation
McCulloch-Pitts neuron (McCulloch and Pitts, 1943) : a simplified
model of the neuron as a kind of computing element (propositionallogic)
Kleene (1951) and (1956) : finite automata and regular expressions.
Shannon (1948) : probabilistic models of discrete Markov processes.
Chomsky (1956) : finite state machines as a way to characterize agrammar
Context-free grammar for natural languages Chomsky (1956)
Backus (1959) and Naur et al. (1960) : ALGOL programming language.
1940s 1950s
8/2/2019 Dr. GV. Uma 3
61/123
1940 s 1950 s
Probalistic algorithms,
Shannon metaphor of the noisy channel
en ropy as a way o measur ng e n orma on capac y o achannel, or the information content of a language,
first measure of the entropy of English by using probabilistic
.
Sound spectrograph (Koenig et al., 1946),
Foundational research in instrumental phonetics
First machine speech recognizers (early 1950s). 1952, Bell Lab, statistical system that could recognize any of the10 digits from a single speaker (Davis et al., 1952).
1940s 1950s
8/2/2019 Dr. GV. Uma 3
62/123
1940 s 1950 s
Machine translation
Major attempts in US and USSR
George Town University, Washington system:
The ALPAC report (1964) u u w
Concluded: MT not possible in near future.
Funding should cease for MT !
Basic research should be supported.
Word to word translation does not work
Linguistic Knowledge is needed
1950s 1970s
8/2/2019 Dr. GV. Uma 3
63/123
Two camps
Symbolic paradigm
1950s 1970s
8/2/2019 Dr. GV. Uma 3
64/123
Symbolic paradigm 1Formal lan ua e theor and enerative s ntax
1957 Noam Chomsky's Syntactic Structures A formal definition of grammars and languages Provides the basis for an automatic syntactic
processing of NL expressions
' Formal semantics for NL.
Basis for logical treatment of NL meaning
1967 : Woods procedural semantics A procedural approach to the meaning of a sentence
Provides the basis for a automatic semanticprocessing of NL expressions
1950s 1970s
8/2/2019 Dr. GV. Uma 3
65/123
Symbolic paradigm 2
Parsing algorithms - -
dynamic programming.
Project (TDAP)
Harris, 1962
Joshi and Hopely (1999) and Karttunen (1999), cascade of finite-state transducers.
1950s 1970s
8/2/2019 Dr. GV. Uma 3
66/123
Symbolic paradigm 3
AI
Summer of 1956 :John McCarthy, Marvin Minsky,Claude Shannon and Nathaniel Rochester work on reasoning and logic
Newell and Simon the Logic Theorist and the General
systems
Domains
om na on o pa ern ma c ng an eywor searc
Simple heuristics for reasoning and question-answering
1950s 1970s
8/2/2019 Dr. GV. Uma 3
67/123
Statistical paradigm 1 Ba esian method to the roblem of o tical character reco nition.
Bledsoe and Browning (1959) : Bayesian text-recognition
a large dictionary compute the likelihood of each observed letter sequence given each word inthe dictionary
Joshi and Hopely (1999) and Karttunen (1999) cascade of finite-state transducers likelihoods for each letter.
Bayesian methods to the problem of authorship attribution on The
Mosteller and Wallace (1964)
Testable psychological models of human language processing
Ressources First online corpora: the Brown corpus of American Englis
an on-line Chinese dialect dictionary.
8/2/2019 Dr. GV. Uma 3
68/123
Symbolic
Based on hand written rules
Requires linguistic expertise
No frequencey information
Often more precise than statistical approaches
Error analysis is usually easier than for statistical approaches
Statistical
Supervised or non-supervised
Not much linguistic expertise required Robust and quick
Error analysis is often difficult
8/2/2019 Dr. GV. Uma 3
69/123
1970-1983
8/2/2019 Dr. GV. Uma 3
70/123
Statistical paradigmSpeech recognition algorithms
Hidden Markov model (HMM) and the metaphors of thenoisy channel and decoding
Jelinek, Bahl, Mercer, and colleagues at IBM s Thomas J.Watson Research Center,
Baker at Carnegie Mellon University
aum an co eagues at t e nst tute or e enseAnalyses in Princeton
AT&Ts Bell
Rabiner and Juang (1993) descriptions of the widerange of this work.
1970-1983
8/2/2019 Dr. GV. Uma 3
71/123
Logic-based paradigm -
(Colmerauer, 1970, 1975)
Warren, 1980)
Functional grammar (Kay,1979)
Lexical Functional Grammar LFG
(Bresnan and Kaplans,1982)im ortance of feature structure unification
1970-1983
8/2/2019 Dr. GV. Uma 3
72/123
Natural language understanding1
SHRDLU system : simulated a robot embedded
in a world of toy blocks (Winograd, 1972a).
natural-language text commands
Move the red block on top of the smaller green one
first to attempt to build an extensive (for the time)
grammar of English (based on Hallidays systemic
grammar)
Ok for parsing
eman c an scourse
1970-1983
8/2/2019 Dr. GV. Uma 3
73/123
Natural language understanding2
Yale School : series of language
understanding programs
conceptual knowledge (scripts, plans, goals..)
human memor or anization network-based semantics (Quillian, 1968)
case roles Fillmore 1968
representations of case roles (Simmons,1973 .
8/2/2019 Dr. GV. Uma 3
74/123
- - -
language-understanding paradigms in-
answering system (Woods, 1967, 1973)
uses predicate logic as a semantic
represen a on
1970-1983
8/2/2019 Dr. GV. Uma 3
75/123
Discourse Modelling
Four key areas in discourse:
Substructure in discourse Grosz, 1977a,
A discourse focus
Automatic reference resolution Hobbs
Sidner, 1983
1978)
BDI Belief-Desire-Intention
framework for logic-based work on speechacts (Perrault and Allen,1980; Cohen andPerrault, 1979).
8/2/2019 Dr. GV. Uma 3
76/123
- Return of state models
Finite-state phonology and morphology (Kaplan and Kay, 1981)
Finite-state models of syntax by Church (1980). Return of empiricism ro a s c mo e s roug ou speec an anguage process ng,
IBM Thomas J. Watson Research Center: probabilistic models of speechrecognition.
Data-driven approaches
- - , , ,semantics.
New focus on model evaluation, Held-out data
Quantitative metrics for evaluation,
Comparison of performance on these metrics with previous publishedresearch.
8/2/2019 Dr. GV. Uma 3
77/123
-Major changes.
Probabilistic and data-driven models had become quite standard
Parsing, part-of-speech tagging, reference resolution, and discourseprocessing gor ms ncorpora e pro a es
Evaluation methodologies from speech recognition and informationretrieval.
commercial exploitation (speech recognition, spelling and grammar
correction)
Rise of the Web need for language-based information retrieval and information
extraction.
1994-1999
8/2/2019 Dr. GV. Uma 3
78/123
Ressources and corpora
s space ecomes c eap
Machine readable text becomes uniquitous
on real data
1994 : The British National Corpus is madeava a e A balanced corpus of British English
Mid 199 s : WordNet Fellbaum & Miller
A computational thesaurus developed bypsycholinguists
2000-2008
8/2/2019 Dr. GV. Uma 3
79/123
Empiricist trends 1
Linguistic Data Consortium (LDC) ...
Annotated collections (standard text sources with various forms ofsyntactic, semantic, and pragmatic annotations) enn ree an arcus e a ., ,
PropBank (Palmer et al., 2005),
TimeBank (Pustejovsky et al., 2003b)
....
ore comp ex ra ona pro ems cas a e n superv se mac nelearning
Parsing and semantic analysis
Competitive evaluations ars ng e ean an ong m ang, ,
Information extraction (NIST, 2007a; Tjong Kim Sang, 2002; Tjong Kim Sangand De Meulder,
2003)
., ,
Question answering (Voorhees and Tice, 1999), and summarization (Dang,2006).
2000-2008
8/2/2019 Dr. GV. Uma 3
80/123
Empiricist trends 2
More serious interplay with the statistical
machine learning community Support vector machines (Boser et al., 1992;
Vapnik, 1995)
Maximum entropy techniques (multinomial
logistic regression) (Berger et al., 1996)
Graphical Bayesian models (Pearl, 1988)
2000-2008
8/2/2019 Dr. GV. Uma 3
81/123
Empiricist trends 2
Largely unsupervised statistical approaches
(Brown et al., 1990; Och and Ney, 2003) t
To ic modelin Blei et al., 2003 Effective applications could be constructed
alone.
8/2/2019 Dr. GV. Uma 3
82/123
8/2/2019 Dr. GV. Uma 3
83/123
Is a talking machine a thinking machine?
cognitive abilities.
" ,Intelligence
Imprecion in the terms machine and think
8/2/2019 Dr. GV. Uma 3
84/123
The com uters use of lan ua e forms the basisfor determining if the machine could think.
If the machine could win the game, it would be.
Three participants two people and a computer.
a human judge engages in a natural languageconversation with one human and one machine
each of which try to appear human
If the judge cannot reliably tell which is which,then the machine is said to pass the test.
8/2/2019 Dr. GV. Uma 3
85/123
program that mimics human conversation.
Jose h Weizenbaum Massachusetts Instituteof Technology
User types in natural language
ELIZA then analyzes the user's statementand generates some response which it
types out. http://www-ai.ijs.si/eliza/eliza.html
ELIZA
8/2/2019 Dr. GV. Uma 3
86/123
Men are all alike.Sample conversation
They're always bugging us about something or other.
CAN YOU THINK OF A SPECIFIC EXAMPLE? Well, m bo friend made me come here.
YOUR BOYFRIEND MADE YOU COME HERE
He says I'm depressed much of the time.
I AM SORRY TO HEAR YOU ARE DEPRESSED
' . DO YOU THINK COMING HERE WILL HELP YOU NOT TO BE
UNHAPPY
I need some help, that much seems certain.
WHAT WOULD IT MEAN TO YOU IF YOU GOT SOME HELP
Perhaps I could learn to get along with my mother. TELL ME MORE ABOUT YOUR FAMILY
Eliza's strategy
K d
8/2/2019 Dr. GV. Uma 3
87/123
Keywords
keyword
When ke word is found the sentence is
transformed according to a rule associated with
the keyword
Else a content-free remark or, under certain
conditions, an earlier transformation is retrieved.
The text so computed or retrieved is then printedout
ELIZA
E l t f ti
8/2/2019 Dr. GV. Uma 3
88/123
Example transformations
"I am X" "How long have you been X?I am very unhappy.
HOW LONG HAVE YOU BEEN VERY UNHAPPY?
"X YOU Y ME" "What makes you think I Y
you?
It seems that ou hate me.
WHAT MAKES YOU THINK THAT I HATE YOU?
ELIZA
Wh d it k?
8/2/2019 Dr. GV. Uma 3
89/123
Why does it work?
Rogerian psychotherapist Can pretend toknow almost nothing of the real world
"I went for a long boat ride"
"Tell me about boats"
Not a real dialog (no real communication) ELiza generates a reaction (often a rather non
'
The user cannot ask for information. The user cannot issue an order
ELIZA
D it ll k?
8/2/2019 Dr. GV. Uma 3
90/123
Does it really work? Eliza: Hello. I am ELIZA. How can I help you?
You: I went to Paris yesterday
Eliza: Can you elaborate on that? ou: oo e ra n
Eliza: Can you elaborate on that?
You: The train was late Eliza: Please go on.
You: All the seats were full
Eliza: Please go on.
You: A little boy was having a tantrum
.
ELIZA
D it ll k?
8/2/2019 Dr. GV. Uma 3
91/123
Does it really work?, , ,
User: I am having fun right now.
Eliza: Does it lease ou to believe that I am havin
fun right now.
World KnowledgeUser: I am a human.
Eliza: How long have you been a human?
8/2/2019 Dr. GV. Uma 3
92/123
Yes/No QuestionsUser: Are you a good therapist?
Eliza: Would you prefer it if I were a good therapist?
User: I'm wondering about your skills as a therapist;
therapist?
Wh-Questions
User: Who are ou?Eliza: Would you prefer it if I were not?
User: What do you want me to tell?
Eliza: Does that question interest you?
Complex Assertions
User: I don't have a problem. I'm quite happy, I have a girlfriend, Istudy. Everything is OK!
', ,girlfriend, you study. Everything is OK!
...
8/2/2019 Dr. GV. Uma 3
93/123
computers, they talk about them and interact with them as social entities.
Weizenbaum (1976) notes that many people continued to believe inELIZAs abilities even after the ro rams o eration was ex lained tothem.
People act toward computers as if they were people and expect thatcomputers should be able to understand their needs and be capable of
n erac ng w em na ura y.
Given these predispositions, speech- and language-based system arenot supposed to be intelligent
But they may provide users with the most naturalinterface for many applications
So what about turin test?
8/2/2019 Dr. GV. Uma 3
94/123
Three main types of applications: Language input technologies
Language output technologies
Language input technologies
8/2/2019 Dr. GV. Uma 3
95/123
Speech recognition
Handwriting recognition
e roconvers on
8/2/2019 Dr. GV. Uma 3
96/123
Two main types of Applications Deskto control: dictation, voice control, navi ation
Telephony-based transaction: travel reservation,remote banking, pizza ordering, voice control
- . Speech recognition is not understanding!
corporaCf. the Parole team Yves La rie
8/2/2019 Dr. GV. Uma 3
97/123
Desktop control . . .
IBM ViaVoice (www.software.ibm.com/speech)
Scansoft's Dra onNaturall S eakin
(www.lhsl.com/naturallyspeaking)
demo
ee a so goog e ca egory:
http://directory.google.com/Top/Computers/Sp
8/2/2019 Dr. GV. Uma 3
98/123
Dictation
Dictation systems can do more than just transcribe whatwas said: leave out the 'ums' and 'eh
implement corrections that are dictated
fill the information into forms
rephrase sentences (add missing articles, verbs and
corrections)
Communicate what is meant, not what is said
commands to the word processing applications (speechmacros eg to insert frequently used blocks of text or tonavi ate throu h form
8/2/2019 Dr. GV. Uma 3
99/123
Telephony-based elded products . . ScanSoft (www.scansoft.com)
. . . Telstra directory enquiry (tel. 12455)
http://directory.google.com/Top/Computers/SpeechTechnology/Telephony/
Language input technologies
8/2/2019 Dr. GV. Uma 3
100/123
Optical character recognition
Key focus Printed material computer readable representation
A lications Scanning (text ) digitized format)
Business card readers (to scan the printed information frombusiness cards into the correct fields of an electronic address
. . Website construction from printed documents
Fielded products aere s mn agewww.scanso .com
Xerox' TextBridge (www.scansoft.com) ExperVision's TypeReader (www.expervision.com)
Language input technologies
Handwriting recognition
8/2/2019 Dr. GV. Uma 3
101/123
Handwriting recognition
Human handwriting computer readable
Applications
Forms processing
Mail routing
Personal digital agenda (PDA)
8/2/2019 Dr. GV. Uma 3
102/123
Isolated letters
'
Computer Intelligence Corporation's Jot
(www.cic.com)
Cursive scripts
Motorola's Lexicaus
ParaGraph's Calligraphper (www.paragraph.com)
cf. the READ team (Abdel Belaid)
8/2/2019 Dr. GV. Uma 3
103/123
Key focus: identify the logical and physical
Applications
Recognising bibliographical references
formulae Document classication
8/2/2019 Dr. GV. Uma 3
104/123
Spoken Language Dialog System ac ne rans at on
Text Summarisation
Search and Information Retrieval
8/2/2019 Dr. GV. Uma 3
105/123
Goal a system that you can talk to in order to carry out some task.
Key focus
Speech synthesis
Dialogue Management
Information provision systems: provides information in responseto query (request for timetable information, weather information)
-
buying/selling stocs or reserving a seat on a plane.
-
8/2/2019 Dr. GV. Uma 3
106/123
-
based systems
User initiative remains limited (or likely to
resu n errors
SLDS
state of the art
8/2/2019 Dr. GV. Uma 3
107/123
state of the art
limited transaction and information
Stock broking system
American Airlines information system
m e n e-s a e a ogue managemen
NL Understanding is poor
8/2/2019 Dr. GV. Uma 3
108/123
. .
SpeechWorks (www.scansoft.com) ps www.speec .p ps.com
See also google category :
http://directory.google.com/Top/Computers/Sp
eechTechnology/
8/2/2019 Dr. GV. Uma 3
109/123
Translating a text written/spoken in one
Applications
Spoken language translation services
8/2/2019 Dr. GV. Uma 3
110/123
' . .
Taum-Meteo (1979): (English/French)
Highly successful
S stran: amon several Euro ean lan ua es
Human assisted translation
Rough translation
Used over the internet through AltaVista
http://babelsh.altavista.com
8/2/2019 Dr. GV. Uma 3
111/123
on the web (Systran)
(TAUM Meteo) or controlled languages
ac ne a e rans a on s mos y use
8/2/2019 Dr. GV. Uma 3
112/123
Text Shorter version of text
pp ca ons
To decide whether it's worth reading the
To read summary instead of full text
o au oma ca y pro uce a s rac
Three main ste s
8/2/2019 Dr. GV. Uma 3
113/123
Three main ste s
1. Extract \important sentences" (compute
document keywords and score document
2. Cohesion check: Spot anaphoric referencesand modify text accordingly (eg add sentence
con a n ng pronoun an ece en ; remove cusentences; remove pronoun)
3. Balance and covera e: modif summar tohave an appropriate text structure (deleteredundant sentences; harmonize tense of
St t f th A t
8/2/2019 Dr. GV. Uma 3
114/123
State of the Art Sentences extracted on the basis of: location,
linguistic cues, statistical information ow scourse co erence
Commercial systems
' . . . Copernic (www.copernic.com)
MS Word's Summarisation tool
See also
http://www.ics.mq.edu.au/~swan/summarization/projects.htm
Information Extraction / Retrieval
and AGiven a NL uer and a document e web
8/2/2019 Dr. GV. Uma 3
115/123
Given a NL uer and a document e. . webpages),
Retrieve document containing answer (retrieval) n temp ate w t re evant n ormat on extract on
Produce answer to query (Q/A)
Excludes: how-to questions, yes-no questions,
uestions that re uire com lex reasonin
Highest possible accuracy estimated at around70%
Information Extraction / Retrieval
and A
8/2/2019 Dr. GV. Uma 3
116/123
, , .
QA systems s eeves www.as eeves.com
Articial life's Alife Sales Rep (www.articial-
. Native Minds'vReps (www.nativeminds.com)
o oquy www.so oquy.com
Language output technologies
8/2/2019 Dr. GV. Uma 3
117/123
Text-to-Speech
Tailored document generation
8/2/2019 Dr. GV. Uma 3
118/123
Key focus
Applications
telephone
Document roofreadin
Voice portals Computer assisted language learning
8/2/2019 Dr. GV. Uma 3
119/123
Requires appropriate use of intonation and
Existing systems
'(www.lhsl.com/realspeak)
British Telecom's Laureate
AT&T Natural Voices(http://www.naturalvoices.att.com)
8/2/2019 Dr. GV. Uma 3
120/123
Key focus ocumen s ruc ure + parame ers
Individually tailored documents
pp ca ons Personalised advice giving
Customised policy manuals
Web delivered dynamic documents
8/2/2019 Dr. GV. Uma 3
121/123
. .
Tailored job descriptions
o en ex www.cogen ex.com
Project status reports
Weather reports
NLP application
summary
8/2/2019 Dr. GV. Uma 3
122/123
All levels of linguistic knowledge are relevant
Two main roblems: ambi uit and ara hrase
NLP applications use a mix of symbolic and statistical methods
Current applications are not perfect as
Symbolic processing is not robust/portable enough Statistical processing is not accurate enough
Applications should be classied into two main types: aids to humanusers e.g., spe c ec ers, mac ne a e rans a ons an agen sin their own right (e.g., NL interfaces to DB, dialogue systems)
Useful applications have been built since the late 70s
Commercial success is harder to achieve
htt ://cslu.cse.o i.edu/HLTsurve /HLTsurve .html
8/2/2019 Dr. GV. Uma 3
123/123
htt ://cslu.cse.o i.edu/HLTsurve /HLTsurve .html
Speech and Language ProcessingAn introduction to Natural Language Processing,
Comptutational Linguistics, and Speech Recognition, by
Daniel Jurafsky and James H. Martin