22
Language Model as an Annotator: Exploring DialoGPT for Dialogue Summarization Xiachong Feng 1 , Xiaocheng Feng 1,2 , Libo Qin 1 , Bing Qin 1,2 , Ting Liu 1,2 1 Harbin Institute of Technology, 2 Peng Cheng Laboratory

Language Model as an Annotator: Exploring DialoGPTfor

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Language Model as an Annotator: Exploring DialoGPT for Dialogue Summarization

Xiachong Feng1, Xiaocheng Feng1,2, Libo Qin1, Bing Qin1,2, Ting Liu1,2

1 Harbin Institute of Technology, 2 Peng Cheng Laboratory

Dialogue Summarization• Dialoguesummarizationaimstogenerateasuccinctsummarywhileretainingessentialinformationofthedialogue.

2

Remember we are seeing the wedding planner after workSure, where are we meeting her? At Nonna Rita’s I want to order seafood tagliatelle Haha why notWe remmber spaghetti pomodoro disaster from our last meetingOmg it was over her white blouse:D:P

Blair and Chuck are going to meet the wedding planner after work at Nonna Rita’s. The tagliatelle served at Nonna Rita’s are very good.

Blair:

Chuck:Blair:Chuck: Blair:Chuck:

Blair:Chuck:Blair:

Dialogue

Summary

A Good Summary?

3

Informativeness Redundancy Relevance

Peyrard (2019):agoodsummaryisintuitivelyrelatedtothreeaspects

Related Works• Forinformativeness• Linguisticallyspecificwords• Domainterminologies• Topicwords

• Forredundancy• Similarity-basedmethodstoannotateredundantutterances

• For relevance• Topicsegmentation

4

Problems• Reliedonhumanannotations.

• labor-consuming

• Obtainedviaopen-domaintoolkits

• Dialogueagnostic

• notsuitablefordialogues

5

Pre-trained Language Models

6

DialoGPT Annotator

7

Overview• KeywordsExtraction:Extractsunpredictablewordsaskeywords.• TopicSegmentation:Insertsatopicsegmentationpointbeforeoneutteranceifitisunpredictable.• RedundancyDetection:Detectsutterancesthatareuselessforcontextrepresentationasredundant.

8

Keywords Extraction: DialoGPTKE• Motivation:ifonewordinthegoldenresponseisdifficulttobeinferredfromDialoGPT,weassumethatitcontainshighinformationandcanbeviewedasakeyword.• Extractsunpredictablewordsaskeywords.

9

Context: yoo guys. EOS1Response: hey wassup. EOS2Context: hey wassup. EOS2Response: Remmber the meeting EOS3Context: Remmber the meeting EOS3Response: I almost forget it. EOS4Context: I almost forget it. EOS4Response: fine EOS5Context: fine EOS5Response: Where? EOS6Context: Where? EOS6Response: at Barbara's place. EOS7

Tom: yoo guys. EOS1John: hey wassup. EOS2Tom: Remmber the meeting EOS3John: I almost forget it. EOS4Tom: fine EOS5John: Where? EOS6Tom: at Barbara's place. EOS7

Context-response Pairs

Original Dialogue

DialoGPT

hey

Remmber

wassup.

the meeting

EOS2

EOS3

Remmber the meeting

Golden:

loss31

Word-level and Utterance-level Loss

Prediction:Keywords Extraction

loss3loss32 loss33 loss34... Extracted

KeywordsAvg

loss31 loss32 loss33 loss34

...

loss71 loss72 loss73 loss34

Topic Segmentation: DialoGPTTS• Motivation:iftheresponseisdifficulttobepredictedgiventhecontextbasedonDialoGPT,weassumetheresponsemaybelongtoanothertopicandthereisatopicsegmentationbetweenthecontextandresponse.• Insertsatopicsegmentationpointbeforeoneutteranceifitisunpredictable.

10

Context: yoo guys. EOS1Response: hey wassup. EOS2Context: hey wassup. EOS2Response: Remmber the meeting EOS3Context: Remmber the meeting EOS3Response: I almost forget it. EOS4Context: I almost forget it. EOS4Response: fine EOS5Context: fine EOS5Response: Where? EOS6Context: Where? EOS6Response: at Barbara's place. EOS7

Tom: yoo guys. EOS1John: hey wassup. EOS2Tom: Remmber the meeting EOS3John: I almost forget it. EOS4Tom: fine EOS5John: Where? EOS6Tom: at Barbara's place. EOS7

Context-response Pairs

Original Dialogue

DialoGPT

hey

Remmber

wassup.

the meeting

EOS2

EOS3

Remmber the meeting

Golden:

loss31

Word-level and Utterance-level Loss

Prediction:

Topic Segmentation

loss3

SegmentationPoint

loss32 loss33 loss34Avg

loss2 loss3 loss4 loss5 loss6 loss7

SegmentationPoint

Redundancy Detection: DialoGPTRD• Motivation:If oneutterancebringsbringslittleinformationandhassmalleffectsonpredictingtheresponse,thisutterancebecomesaredundantutterance.• Detectsutterancesthatareuselessforcontextrepresentationasredundant.

11

Tom: yoo guys. EOS1John: hey wassup. EOS2Tom: Remmber the meeting EOS3John: I almost forget it. EOS4Tom: fine EOS5John: Where? EOS6Tom: at Barbara's place. EOS7

Dialogue Sequence

DialoGPT

yoo guys. EOS1 hey wassup. EOS2Remmber the meeting EOS3 I almostforget it. EOS4 fine EOS5 Where? EOS6 at Barbara's place. EOS7

Original Dialogue

yoo guys. EOS1... at Barbara's place. EOS7

...Dialogue Context

Representation

𝒉𝑬𝑶𝑺𝟏 𝒉𝑬𝑶𝑺𝟐 𝒉𝑬𝑶𝑺𝟕

0.711

0.998

0.991

SimilarityCalculation

Step: 1

Step: 2

Step: 3

0.642

Step: 4

Step: 50.573

Step: 60.993

5 4

Step: 0

5 4

5 4

5

Redundantutterances

Steps

2

ℎ!"#!ℎ!"#" ℎ!"## ℎ!"#$ ℎ!"#% ℎ!"#& ℎ!"#'

ℎ!"#!ℎ!"#" ℎ!"## ℎ!"#$ ℎ!"#% ℎ!"#& ℎ!"#'

ℎ!"#!ℎ!"#" ℎ!"## ℎ!"#$ ℎ!"#% ℎ!"#& ℎ!"#'

ℎ!"#!ℎ!"#" ℎ!"## ℎ!"#$ ℎ!"#% ℎ!"#& ℎ!"#'

ℎ!"#!ℎ!"#" ℎ!"## ℎ!"#$ ℎ!"#% ℎ!"#& ℎ!"#'

ℎ!"#!ℎ!"#" ℎ!"## ℎ!"#$ ℎ!"#% ℎ!"#& ℎ!"#'

ℎ!"#!ℎ!"#" ℎ!"## ℎ!"#$ ℎ!"#% ℎ!"#& ℎ!"#'

Annotation Tags

12

Summarizer

13

BARTPre-trained PGN

Nonpre-trained

Dataset and Metrics• Datasets• SAMSum• AMI

• Evaluation Metrics• ROUGE• BERTScore

14

Statistics for SAMSum and AMI datasets

Automatic Evaluation

15TestsetresultsontheSAMSum dataset BERTScore

TestsetresultsontheAMIdataset

Human Evaluation

16

modelcangetthebestscoreinconcisenessmodelcanperform

betterincoverage

Effect of DialoGPTKE

17

• Entitiesplayanimportantroleinthesummarygeneration.

• CombinedwithDialoGPT embeddings,KeyBERT cangetbetterresults.

Intrinsic Evaluation For Keywords

• Viewreferencesummarywordsasgoldenkeywords

• BothTextRank andEntitiesperformpoorlyinrecall

• Ourmethodcanextractmorediversekeywords.

Effect of DialoGPTRD

18

• Rule-basedmethod:annotatesutteranceswithoutnoun,verbandadjectiveasredundant.

• OurmethodshowsmoreadvantagesforlongandverbosemeetingtranscriptsintheAMI.

Effect of DialoGPTTS

19

• OurmethodcangetcomparableresultswiththestrongbaselineC99(w/DialoGPT emb).

Ablation Studies for Annotations

20

• Forbothdatasets,trainingsummarizersbasedon datasetswithtwoofthreeannotationscansurpasscorrespondingsummarizersthataretrainedbasedondatasetswithonetypeofannotation.

• SummarizersthataretrainedonDKE+TS stillgetimprovementsonbothdatasets.

Conclusion• WeinvestigatetouseDialoGPT asunsupervisedannotatorsfordialoguesummarization,includingkeywordsextraction,redundancydetectionandtopicsegmentation.

• Experimentalresultsshowthatourmethodconsistentlyobtainsimprovementsuponpre-traind summarizer(BART)andnonpre-trainedsummarizer(PGN)onbothdatasets.

• Combiningallthreeannotations,oursummarizercanachievenewstate-of-the-artperformanceontheSAMSum dataset.

21

Thanks!