1
Natural Language Natural Language GenerationGeneration
ART on Dialogue Models and ART on Dialogue Models and Dialogue SystemsDialogue Systems
François MairesseUniversity of [email protected]://www.dcs.shef.ac.uk/~francois
François Mairesse, University of Sheffield 2
OverviewOverview
•• NLG: what it is? what does it do?NLG: what it is? what does it do?•• TemplateTemplate--based generation based generation
(canned text)(canned text)•• RuleRule--based generationbased generation•• Trainable NLGTrainable NLG
2
François Mairesse, University of Sheffield 3
Some applicationsSome applications
•• Simple report/letter writingSimple report/letter writing–– WeatherReporterWeatherReporter: textual weather reports: textual weather reports–– STOP: STOP: personalisedpersonalised smokingsmoking--cessation letterscessation letters–– ModelExplainerModelExplainer: UML diagrams description for software : UML diagrams description for software
developmentdevelopment
•• Question answering about knowledge basesQuestion answering about knowledge bases
•• Automated summarization of textAutomated summarization of text
•• Machine translationMachine translation
•• Dialogue systemsDialogue systems
François Mairesse, University of Sheffield 4
Inputs to a generatorInputs to a generator•• Content planContent plan
–– Meaning representation of what to communicateMeaning representation of what to communicate•• E.g. describe a particular restaurantE.g. describe a particular restaurant
•• Knowledge baseKnowledge base•• E.g. database of restaurantsE.g. database of restaurants
•• User modelUser model–– Imposes constraints on output utteranceImposes constraints on output utterance
•• E.g. user wants short utterancesE.g. user wants short utterances•• Dialogue historyDialogue history
•• E.g. to avoid repetitions, referring expressionsE.g. to avoid repetitions, referring expressions
3
François Mairesse, University of Sheffield 5
Natural language generation Natural language generation objectivesobjectives
•• From a meaning representation of what to sayFrom a meaning representation of what to say–– E.g. entities described by features in an ontologyE.g. entities described by features in an ontology–– E.g. E.g. has(WokThisWayhas(WokThisWay, cuisine(bad)), cuisine(bad))
•• Output: a natural language string describing the Output: a natural language string describing the inputinput–– E.g. E.g. ““WokThisWayWokThisWay’’ss food is food is awfulawful””
•• Desirable propertiesDesirable properties–– Simple to useSimple to use–– Able to generate wellAble to generate well--formed, humanformed, human--like sentenceslike sentences–– Trainable? Able to learn? Trainable? Able to learn? –– Variation in the output?Variation in the output?
François Mairesse, University of Sheffield 6
TemplateTemplate--based generationbased generation
•• Most common technique in spoken Most common technique in spoken language generationlanguage generation
•• In simplest form, words fill in slots:In simplest form, words fill in slots:““Flights from SRC to DEST on DATE. One moment Flights from SRC to DEST on DATE. One moment
please.please.””•• Most common sort of NLG found in Most common sort of NLG found in
commercial systemscommercial systems•• Used in conjunction with Used in conjunction with concatenativeconcatenative TTS TTS
to make naturalto make natural--sounding outputsounding output
4
François Mairesse, University of Sheffield 7
TemplateTemplate--based generation: based generation: Pros & ConsPros & Cons
•• ProsPros–– Conceptually simpleConceptually simple
•• No specialized knowledge needed to developNo specialized knowledge needed to develop
–– Tailored to the domain, so often good qualityTailored to the domain, so often good quality
•• ConsCons–– Lacks generalityLacks generality
•• Repeatedly encode linguistic rules (e.g., subjectRepeatedly encode linguistic rules (e.g., subject--verb agreement)verb agreement)
–– Little variation in styleLittle variation in style–– Difficult to grow/maintainDifficult to grow/maintain
•• Each new utterance must be added by handEach new utterance must be added by hand
François Mairesse, University of Sheffield 8
Enhance template generationEnhance template generation
•• Templates can be expanded/replaced to Templates can be expanded/replaced to contain information needed to generate contain information needed to generate more complex utterancesmore complex utterances
Need deeper utterance representationsNeed deeper utterance representationsNeed linguistic rules to manipulate Need linguistic rules to manipulate
themthem
5
François Mairesse, University of Sheffield 9
Components of a Components of a rulerule--based generatorbased generator
•• Content planningContent planning–– What information must be communicated?What information must be communicated?
•• Content selection and orderingContent selection and ordering
•• Sentence planningSentence planning–– What words and syntactic constructions will be used for describiWhat words and syntactic constructions will be used for describing ng
the content?the content?•• AggregationAggregation
–– What elements can be grouped together for more naturalWhat elements can be grouped together for more natural--sounding, succinct sounding, succinct output?output?
•• LexicalizationLexicalization–– What words are used to express the various entities?What words are used to express the various entities?
•• RealizationRealization–– How is it all combined into a sentence that is syntactically andHow is it all combined into a sentence that is syntactically and
morphologically correct?morphologically correct?
•• Prosody assignmentProsody assignment (spoken language generation only)(spoken language generation only)–– How to produce appropriate speech based on the previous levels oHow to produce appropriate speech based on the previous levels of f
representation?representation?
François Mairesse, University of Sheffield 10
Spoken language generation: Spoken language generation: pipeline architecturepipeline architecture
What to say
Content Planner
How to Say It
Sentence Planner
Surface Realizer
Prosody Assigner
What is Heard
Speech Synthesizer
Dialogue Manager Spoken Language Generation
Speech Synthesis
6
François Mairesse, University of Sheffield 11
ExampleExample•• Output from dialogue managerOutput from dialogue manager
–– Two assertionsTwo assertionshas(WokThisWayhas(WokThisWay, cuisine(bad)), cuisine(bad))has(WokThisWayhas(WokThisWay, decor(good)), decor(good))
•• Content planningContent planning–– Select information orderingSelect information ordering
•• Sentence planningSentence planning–– Choose syntactic templatesChoose syntactic templates–– Choose lexiconChoose lexicon
•• bad bad awful; cuisine awful; cuisine food qualityfood quality•• good good excellent; decor excellent; decor ddéécorcor
–– Aggregate the two proposition by merging objectsAggregate the two proposition by merging objects–– Generate referring expressionsGenerate referring expressions
•• ENTITY ENTITY this restaurantthis restaurant
ENTITY
HAVE
MODIFIER
FEATURE
Subj Obj
François Mairesse, University of Sheffield 12
Example (continued)Example (continued)•• RealizationRealization
–– Choose correct verb inflection: HAVE Choose correct verb inflection: HAVE hashas–– No article needed for feature namesNo article needed for feature names–– Convert sentence representation into a final stringConvert sentence representation into a final string–– Capitalize first letter and insert punctuationCapitalize first letter and insert punctuation
•• Prosody assignmentProsody assignment–– Standard pitch for an assertionStandard pitch for an assertion–– Emphasize user preference for food quality by increasing the voiEmphasize user preference for food quality by increasing the voice ce
intensity for modifier intensity for modifier ““awfulawful””
““This restaurant has This restaurant has awfulawful food quality food quality but excellent decor.but excellent decor.””
7
François Mairesse, University of Sheffield 13
Content planningContent planning•• Typically look at spoken/textual data to characterize Typically look at spoken/textual data to characterize
how information is how information is •• SelectedSelected•• OrderedOrdered•• Combined togetherCombined together
•• A content planner will take a meaning representation A content planner will take a meaning representation and produce a content plan treeand produce a content plan tree–– Leaves are bits of informationLeaves are bits of information–– Internal nodes are rhetorical relations Internal nodes are rhetorical relations
(Mann & Thompson, 1988)(Mann & Thompson, 1988)•• E.g. justification, contrast, inference, etc.E.g. justification, contrast, inference, etc.
François Mairesse, University of Sheffield 14
Example content plan treeExample content plan tree
•• For a restaurant recommendationFor a restaurant recommendation•• Each leaf is associated with a syntactic Each leaf is associated with a syntactic
templatetemplate
8
François Mairesse, University of Sheffield 15
Sentence planningSentence planning
•• Three main tasksThree main tasks–– LexicalisationLexicalisation
•• Many ways to express entities and rhetorical relationsMany ways to express entities and rhetorical relations–– E.g. E.g. Justify(X,YJustify(X,Y) ) ““X because YX because Y””
““X since YX since Y””•• Typically a domain lexeme database to avoid any Typically a domain lexeme database to avoid any
misunderstandingmisunderstanding–– E.g. CUISINE E.g. CUISINE ““foodfood””
–– AggregationAggregation–– Referring expression generationReferring expression generation
François Mairesse, University of Sheffield 16
Sentence planning: aggregationSentence planning: aggregation•• Produces a shorter utterances and dialogues, but adds Produces a shorter utterances and dialogues, but adds
complexitycomplexity
•• Simple: combine two sentences using a conjunctionSimple: combine two sentences using a conjunction
•• Merge two sentences with same subject or same objectMerge two sentences with same subject or same object–– E.g. E.g. ““The pizza is warmThe pizza is warm”” + + ““The pizza is tastyThe pizza is tasty””
““The pizza is warm and tastyThe pizza is warm and tasty””–– E.g. E.g. ““John bought a TVJohn bought a TV”” ““Sam bought a TVSam bought a TV””
DOESNDOESN’’T ALWAYS WORK!T ALWAYS WORK!
•• Syntactic embeddingSyntactic embedding–– E.g. E.g. ““The pizza is warmThe pizza is warm”” + + ““II’’m eating the pizzam eating the pizza””
““The pizza that IThe pizza that I’’m eating is warmm eating is warm””““II’’m eating the warm pizzam eating the warm pizza””
9
François Mairesse, University of Sheffield 17
Sentence planning: Sentence planning: referring expressionsreferring expressions
•• How to refer to an entity?How to refer to an entity?–– ““The Frog & ParrotThe Frog & Parrot””, , ““The pubThe pub””, , ““ItIt””, etc., etc.–– Need to know if initial reference Need to know if initial reference
dialogue historydialogue history
–– Pronominalization algorithmPronominalization algorithm•• TradeTrade--off between missed pronouns and inappropriate off between missed pronouns and inappropriate
pronounspronouns–– PronominalizePronominalize all entities previously mentioned?all entities previously mentioned?
No! Need to check for ambiguities, if entity with same No! Need to check for ambiguities, if entity with same person, gender and number was mentionedperson, gender and number was mentioned
François Mairesse, University of Sheffield 18
Pipeline architecturePipeline architecture•• AdvantagesAdvantages
–– ModularityModularity•• Helps managing complexityHelps managing complexity•• Components can be improved independentlyComponents can be improved independently
•• HoweverHowever–– Lower level components canLower level components can’’t influence higher t influence higher
level generation decisionslevel generation decisions•• E.g. if the utteranceE.g. if the utterance’’s length needs to be controlleds length needs to be controlled
–– Content and sentence planning decisions need to be Content and sentence planning decisions need to be influenced by the influenced by the realizerrealizer
–– Many other research systems, but harder to Many other research systems, but harder to maintain and scale upmaintain and scale up
–– Do humans use a pipeline?Do humans use a pipeline?
10
François Mairesse, University of Sheffield 19
QuestionQuestion
•• If you had to build a dialogue system, If you had to build a dialogue system, which approach would you choose for which approach would you choose for your NLG component (between your NLG component (between templates and more complex linguistic templates and more complex linguistic rules) and why? Feel free to choose a rules) and why? Feel free to choose a particular domain to support your case.particular domain to support your case.
François Mairesse, University of Sheffield 20
Trainable NLGTrainable NLG
11
François Mairesse, University of Sheffield 21
Making NLG trainableMaking NLG trainable
•• What does it mean?What does it mean?–– Produce better language Produce better language automaticallyautomatically by by
looking at a collection of existing textslooking at a collection of existing texts
•• Why?Why?–– Make it less domain dependent Make it less domain dependent
•• Different sources of data for different domainDifferent sources of data for different domain–– Produce more complex utterancesProduce more complex utterances
•• Requires less linguistic expertiseRequires less linguistic expertise•• Idioms canIdioms can’’t be produced by rulest be produced by rules•• E.g. E.g. ““This restaurantThis restaurant’’s food is to die fors food is to die for””•• E.g. E.g. ““The service will make you want to kill yourselfThe service will make you want to kill yourself””
François Mairesse, University of Sheffield 22
Making NLG trainableMaking NLG trainable•• How?How?
–– OvergenerateOvergenerate and rankand rank•• Produce various candidate utterancesProduce various candidate utterances
–– RuleRule--basedbased•• Use a statistical model to rank themUse a statistical model to rank them
–– Function assigning a score to utterancesFunction assigning a score to utterances–– Typically learned based on textual dataTypically learned based on textual data
•• ProPro–– Initial generation can be imperfectInitial generation can be imperfect
»» Conflicts between generation choicesConflicts between generation choices•• ConsCons
–– Usually high number of utterances to choose fromUsually high number of utterances to choose from–– Can be hard to extract good model from dataCan be hard to extract good model from data
12
François Mairesse, University of Sheffield 23
StatisticalRanker
• Ngram model based on 250
million words of newspaper text
HALogenHALogen: combining rules with : combining rules with statistical language modelsstatistical language models
((LangkildeLangkilde--Geary, 2002)Geary, 2002)
OutputInput
Packed set of expressions
SymbolicGenerator
•Mapping rules•Dictionaries •Morphology
François Mairesse, University of Sheffield 24
Example input formatExample input format
(a1 / |conform,adapt|(a1 / |conform,adapt|
:AGENT (n1 / NONHUMAN:AGENT (n1 / NONHUMAN--ANIMAL)ANIMAL)
:REASON (c1 / |alter>:REASON (c1 / |alter>verbifyverbify||
:GPI (e1 / |environs|))):GPI (e1 / |environs|)))
13
François Mairesse, University of Sheffield 25
Example Input and OutputExample Input and Output(a1 / |conform,adapt|(a1 / |conform,adapt|
:AGENT (n1 / NONHUMAN:AGENT (n1 / NONHUMAN--ANIMAL)ANIMAL):REASON (c1 / |alter>:REASON (c1 / |alter>verbifyverbify||
:GPI (e1 / |environs|))):GPI (e1 / |environs|)))
NotNot--soso--ideal:ideal:•• Beasts are adjusting because of a surroundBeasts are adjusting because of a surround’’s alteration.s alteration.•• Faunas conformed due to Faunas conformed due to alteratiaalteratia of environs.of environs.•• Because of changing of surroundings, creature adapts.Because of changing of surroundings, creature adapts.Ideal:Ideal:•• The animals adapted because of environmental The animals adapted because of environmental
changes.changes.
François Mairesse, University of Sheffield 26
( / ( / ““serveserve””:voice passive:voice passive:logical:logical--object <object <cuisinecuisine>>:logical:logical--subject <subject <venuevenue>)>)
Recasting input for Recasting input for surfacesurface--level syntaxlevel syntax
IF top level contains logical-subject, and also contains voice=passive,
THEN map logical-subject to postmod, and add anchor=by.
( / “serve”:voice passive:logical-object <cuisine>:postmod ( / <venue>
:anchor by ))
( / ( / ““serveserve””:voice passive:voice passive:subject <:subject <cuisinecuisine>>:postmod ( / <:postmod ( / <venuevenue>>
:anchor by)):anchor by))
“<Cuisine> is served by <venue>.”
14
François Mairesse, University of Sheffield 27
Symbolic GeneratorSymbolic Generator
Mapping rules (about 255 rules)Mapping rules (about 255 rules)1.1. Recast one input to another Recast one input to another 2.2. Add missing information to underAdd missing information to under--specified inputsspecified inputs3.3. Assign linear order to constituentsAssign linear order to constituents4.4. Apply functions, such as morphological inflectionApply functions, such as morphological inflectionDictionariesDictionaries
A.A. SensusSensus dictionary, based on WordNet dictionary, based on WordNet (~100,000 words and concepts)(~100,000 words and concepts)
B.B. ClosedClosed--class lexiconclass lexiconC.C. UserUser--defined dictionarydefined dictionaryMorphology rulesMorphology rules
François Mairesse, University of Sheffield 28
Using statistical language model Using statistical language model to prune choicesto prune choices
•• How to select the best alternative?How to select the best alternative?–– Estimate the probability of occurrence based on a Estimate the probability of occurrence based on a
corpus: ncorpus: n--gram language models gram language models •• Estimates the probability of a sentence, by counting Estimates the probability of a sentence, by counting
words in a corpuswords in a corpus
•• Markov assumption: probability of a word does only Markov assumption: probability of a word does only depend on the n previous wordsdepend on the n previous words
15
François Mairesse, University of Sheffield 29
N gram examplesN gram examples•• BigramBigram model (n = 2)model (n = 2)
•• Trigram (n = 3)Trigram (n = 3)
François Mairesse, University of Sheffield 30
Computing NComputing N--gramsgrams
Slightly more complicated to deal with Slightly more complicated to deal with zeros (interpolation)zeros (interpolation)
P(wi | wi−1) =count(wi−1,wi)
count(wi−1)
16
François Mairesse, University of Sheffield 31
How well do nHow well do n--grams make grams make linguistic decisions?linguistic decisions?
Relative pronounRelative pronoun Preposition (Preposition (bigramsbigrams))visitor who 9 visitors who 20 in Japan 5413 to Jvisitor who 9 visitors who 20 in Japan 5413 to Japan 1196apan 1196visitor which 0 visitors which 0visitor which 0 visitors which 0 came into 244 arrived into 0came into 244 arrived into 0visitor that 9 visitors that 14 visitor that 9 visitors that 14 came to 2443 arrived in 544came to 2443 arrived in 544
came in 1498 arrived to 35came in 1498 arrived to 35Preposition (trigrams)Preposition (trigrams)came to Japan 7 arrived to Japan 0came to Japan 7 arrived to Japan 0came into Japan 1 arrived into Japan 0came into Japan 1 arrived into Japan 0came in Japan 0 arrived in Japan 4came in Japan 0 arrived in Japan 4
Word Choice/Singular Word Choice/Singular vsvs PluralPluralreliance 567 reliance 567 reliancesreliances 0 0 trust 6100 trusts 1083trust 6100 trusts 1083
François Mairesse, University of Sheffield 32
How well does How well does HALogenHALogen work?work?
•• Minimally specified input frame (Minimally specified input frame (bigrambigram model):model):It would sell its fleet age of Boeing Co. 707s because of It would sell its fleet age of Boeing Co. 707s because of maintenance costs increase the company announced maintenance costs increase the company announced earlier.earlier.
•• Minimally specified input frame (trigram model):Minimally specified input frame (trigram model):The company earlier announced it would sell its fleet age of The company earlier announced it would sell its fleet age of Boeing Co. 707s because of the increase maintenance Boeing Co. 707s because of the increase maintenance costs.costs.
•• Almost fully specified input frame:Almost fully specified input frame:Earlier the company announced it would sell its aging fleet Earlier the company announced it would sell its aging fleet of Boeing Co. 707s because of increased maintenance of Boeing Co. 707s because of increased maintenance costs.costs.
17
François Mairesse, University of Sheffield 33
NN--gram modeling limitationsgram modeling limitationsHigher n produces better results, but less data to Higher n produces better results, but less data to estimate probability correctly!estimate probability correctly!
Highly dependent on the source of text (newspaper Highly dependent on the source of text (newspaper articles)articles)•• Spoken language?Spoken language?
N gram will never model deep relations in a N gram will never model deep relations in a sentence, like correct pronouns or distant subjectsentence, like correct pronouns or distant subject--verb agreementverb agreement•• E.g.E.g. The restaurant which The restaurant which …… has has ……
François Mairesse, University of Sheffield 34
SPoT/SParKYSPoT/SParKY: : A trainable generator with A trainable generator with deeper linguistic featuresdeeper linguistic features
(Walker et al. 2002)(Walker et al. 2002)
18
François Mairesse, University of Sheffield 35
Stochastic generationStochastic generation•• Randomly generate sentence plan trees from a Randomly generate sentence plan trees from a
content plan treecontent plan tree–– Map rhetorical relations to clause combining operationsMap rhetorical relations to clause combining operations
E.g. E.g. justification justification sincesince,, becausebecauseinference inference conjunction, period, mergeconjunction, period, merge
–– Nodes are orderedNodes are ordered
François Mairesse, University of Sheffield 36
Generating Sentence PlansGenerating Sentence Plans•• How to express each information?How to express each information?
Database of Database of Deep Syntactic StructuresDeep Syntactic Structures (DSyntS, similar (DSyntS, similar to parse trees)to parse trees)
•• Operations combine Operations combine DSyntSDSyntS’’ss into larger into larger DSyntSDSyntS’’ss
RESTAURANTNAME MOD
OBJCOMPLSUBJ
19
François Mairesse, University of Sheffield 37
Stochastic generationStochastic generation•• Last step: realization of each alternativeLast step: realization of each alternative
–– RealProRealPro (Lavoie and (Lavoie and RambowRambow 97)97)•• Combines the syntactic structure (DSyntS) into a Combines the syntactic structure (DSyntS) into a
surface sentence, using rules of English (e.g. surface sentence, using rules of English (e.g. agreement)agreement)
“WokThisWay has the best overall quality among the selected restaurants since it is a Chinese restaurant, with good service, its price is 24 pounds, and it has good food quality.”
“WokThisWay is a Chinese restaurant, with good food quality. It has good service. Its price is 24 pounds. It has the best overall quality among the selected restaurants.”
François Mairesse, University of Sheffield 38
Trainable sentence rankingTrainable sentence ranking•• Training the rankerTraining the ranker
–– Training data: user ratings of sentencesTraining data: user ratings of sentences
–– Learning algorithm: Learning algorithm: RankBoostRankBoost (Freund et al. 98)(Freund et al. 98)•• Non linear function approximation algorithm, in which the functiNon linear function approximation algorithm, in which the function ranks its on ranks its
argumentsarguments–– Generalizes user ratings for any new sentenceGeneralizes user ratings for any new sentence
•• Compute ranking scoreCompute ranking score
Blah BlahBlah…
AB
User A User BSentence 1 4/5 5/5Sentence 2 3/5 1/5Sentence 3 2/5 3/5
…
?
20
François Mairesse, University of Sheffield 39
Trainable sentence rankingTrainable sentence ranking•• Want to learn user preferences, but how to Want to learn user preferences, but how to
represent each sentence plan tree in a finite way?represent each sentence plan tree in a finite way?•• Associate features to each alternative treeAssociate features to each alternative tree
–– Node counts of sentence plan tree Node counts of sentence plan tree
2
Leaves ( assert-reco-cuisine, assert-reco-service ) = 1
3
1Traversal ( WITH-NS-infer,assert-reco-cuisine, = 1assert-reco-service )
Leaves-under ( WITH-NS-infer ) = 2
François Mairesse, University of Sheffield 40
Evaluation GoalsEvaluation Goals•• Major problem: not clear that quality is good enough for real syMajor problem: not clear that quality is good enough for real systemsstems
•• Training evaluation: shows that the learning algorithm (Training evaluation: shows that the learning algorithm (RankBoostRankBoost) did ) did a good job learning from judgesa good job learning from judges’’ feedbackfeedback
–– Compare the human score of the highest Compare the human score of the highest ranked alternative with the best alternative chosen by the judgeranked alternative with the best alternative chosen by the judgess
•• But doesnBut doesn’’t showt show–– That the output quality is good (for real people)That the output quality is good (for real people)–– How the output quality compares to ruleHow the output quality compares to rule--based approaches or template based approaches or template
approachesapproaches
01020304050607080
1 1.5 2 2.5 3 3.5 4 4.5 5Scores
Num
ber o
f Ins
tanc
es
RandomSpotBest
21
François Mairesse, University of Sheffield 41
Evaluation ExperimentEvaluation Experiment•• 60 subjects compared 60 subjects compared
–– 7 generators7 generators–– On outputs for 20 text plansOn outputs for 20 text plans–– Provided subjective rating on 1..5 scaleProvided subjective rating on 1..5 scale
•• Communicator: Template based generator Communicator: Template based generator •• SPoTSPoT: Trainable sentence planner: Trainable sentence planner•• Two RuleTwo Rule--basedbased•• Two Baseline: No Aggregation, RandomTwo Baseline: No Aggregation, Random•• Best: human selection from RandomBest: human selection from Random
François Mairesse, University of Sheffield 42
Experimental Web Page Experimental Web Page ExampleExample
Dialog Turn 2
Dialog Context So Far (applies to allversions of the system utterance): SYSTEM: Flying to Dallas, What departureairport was that? USER: from Newark on Septemberthe first <\table>
Possible variants for next dialogutterances by system Consider thefollowing statement: The system'sutterance is easy to understand,well-formed, and appropriate to thedialogue context. For each variant, pleaserate to what extent you agree with thisstatement.
1. What time would you like to fly onSeptember the 1 to DALLAS fromNewark?
Completely disagree Somewhatdisagree Neither agree nor disagree
Somewhat agree Completelyagree
2. Leaving on the 1. Leaving inSeptember. Going to DALLAS.Leaving from Newark. What timewould you like to leave?
Completely disagree Somewhatdisagree Neither agree nor disagree
Somewhat agree Completelyagree
3. What time would you like to travelon September the 1 to DALLASfrom Newark?
Completely disagree Somewhatdisagree Neither agree nor disagree
Somewhat agree Completelyagree
Dialog Turn 2
Dialog Context So Far (applies to allversions of the system utterance): SYSTEM: Flying to Dallas, What departureairport was that? USER: from Newark on Septemberthe first <\table>
Possible variants for next dialogutterances by system Consider thefollowing statement: The system'sutterance is easy to understand,well-formed, and appropriate to thedialogue context. For each variant, pleaserate to what extent you agree with thisstatement.
1. What time would you like to fly onSeptember the 1 to DALLAS fromNewark?
Completely disagree Somewhatdisagree Neither agree nor disagree
Somewhat agree Completelyagree
2. Leaving on the 1. Leaving inSeptember. Going to DALLAS.Leaving from Newark. What timewould you like to leave?
Completely disagree Somewhatdisagree Neither agree nor disagree
Somewhat agree Completelyagree
3. What time would you like to travelon September the 1 to DALLASfrom Newark?
Completely disagree Somewhatdisagree Neither agree nor disagree
Somewhat agree Completelyagree
22
François Mairesse, University of Sheffield 43
Results of Evaluation Results of Evaluation (60 Subjects)(60 Subjects)
2
2.2
2.4
2.6
2.8
3
3.2
3.4
3.6
3.8
4
Score
Random
No Agg.
Rule-Based 1
Rule-Based 2
Best
Spot
Communicator
François Mairesse, University of Sheffield 44
Results of EvaluationResults of Evaluation•• Random worst, no aggregation second worstRandom worst, no aggregation second worst
•• RuleRule--based systems scored in mediumbased systems scored in medium--rangerange
•• SPoTSPoT and templateand template--based score equally wellbased score equally well
•• But But SPoTSPoT was trained for this domain in days, was trained for this domain in days, templatetemplate--based developed over ~ 2 years!based developed over ~ 2 years!
23
François Mairesse, University of Sheffield 45
Linguistic VariationLinguistic Variation•• Use different models for rankingUse different models for ranking
–– E.g. nE.g. n--gram models computed on texts with gram models computed on texts with different styledifferent style
•• Problem favor Problem favor ‘‘averageaverage’’ stylestyle•• A lot of variation is A lot of variation is idiomatic idiomatic
–– E.g. breaking the ice, beating around the bushE.g. breaking the ice, beating around the bush•• Stored in human memory?Stored in human memory?•• Paraphrasing problemParaphrasing problem
–– Map a meaning representation to multiple Map a meaning representation to multiple realizationsrealizations
–– Major problem: not much data available!Major problem: not much data available!
François Mairesse, University of Sheffield 46
Paraphrase acquisitionParaphrase acquisition
•• With a sentenceWith a sentence--aligned corpusaligned corpus
–– Merge nodes of parse trees Merge nodes of parse trees together recursively together recursively (Pang et al., 03)(Pang et al., 03)
–– Many false paraphrasesMany false paraphrases–– SentenceSentence--aligned corpus hard to obtain, even more for spoken aligned corpus hard to obtain, even more for spoken
dialoguesdialogues
24
François Mairesse, University of Sheffield 47
Paraphrase acquisitionParaphrase acquisition•• Without aligned corpus: DIRT Without aligned corpus: DIRT (Lin & (Lin & PantelPantel, 2001), 2001)
–– Compute paths in parse treesCompute paths in parse trees•• Leaves are argumentsLeaves are arguments•• E.gE.g N:subj:V buy V:from:N
X buys something from Y
–– Based on the Based on the Distributional HypothesisDistributional Hypothesis::If two paths tend to occur in similar contexts, If two paths tend to occur in similar contexts, the meanings of the paths tend to be similarthe meanings of the paths tend to be similar
–– For each pair of paths, compute a similarity measure based on thFor each pair of paths, compute a similarity measure based on the e number of occurrences with identical argumentsnumber of occurrences with identical arguments
–– Resulting paraphrases are very noisy, produces antonym phrasesResulting paraphrases are very noisy, produces antonym phrases
Still lot of work to be done!Still lot of work to be done!
François Mairesse, University of Sheffield 48
Conclusion Conclusion •• Complex dialogue needs NLGComplex dialogue needs NLG
•• Template are simple to implement and produce Template are simple to implement and produce good results for a very small domain and inflexible good results for a very small domain and inflexible dialoguesdialogues
•• RuleRule--based NLG allows you to produce richer based NLG allows you to produce richer utterances, but still highly domain dependentutterances, but still highly domain dependent
•• MachineMachine--learning viable alternative to handlearning viable alternative to hand--crafting in NLG, probably only option for systems crafting in NLG, probably only option for systems with large generation capabilitieswith large generation capabilities