93

Click here to load reader

Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

Embed Size (px)

Citation preview

Page 1: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

1

Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue SystemsYUN-NUNG (VIVIAN) CHEN H T T P : / / V I V I A N C H E N . I D V.T W

2015 DECEMBER

C O M M I T T E E : A L E X A N D E R I . R U D N I C K Y ( C H A I R ) A N ATO L E G E RS H M A N ( C O - C H A I R ) A L A N W B L A C K D I L E K H A K K A N I - T Ü R ( )

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 2: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

2

Outline Introduction

Ontology Induction [ ASRU’13, SLT’14a]

Structure Learning [NAACL-HLT’15]

Surface Form Derivation [SLT’14b]

Semantic Decoding [ACL-IJCNLP’15]

Intent Prediction [SLT’14c, ICMI’15]

SLU in Human-Human Conversations [ASRU’15]

Conclusions & Future Work

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Knowledge Acquisition

SLU Modeling

Page 3: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

3

Outline Introduction

Ontology Induction [ ASRU’13, SLT’14a]

Structure Learning [NAACL-HLT’15]

Surface Form Derivation [SLT’14b]

Semantic Decoding [ACL-IJCNLP’15]

Intent Prediction [SLT’14c, ICMI’15]

SLU in Human-Human Conversations [ASRU’15]

Conclusions & Future Work

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 4: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

4

Apple Siri

(2011)

Google Now

(2012)

Microsoft Cortana

(2014)

Amazon Alexa/Echo

(2014)

https://www.apple.com/ios/siri/https://www.google.com/landing/now/http://www.windowsphone.com/en-us/how-to/wp8/cortana/meet-cortanahttp://www.amazon.com/oc/echo/

Facebook M

(2015)

Intelligent Assistants

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 5: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

5

Large Smart Device Population Global Digital Statistics (2015 January)

Global Population

7.21B

Active Internet Users

3.01B

Active Social Media Accounts

2.08B

Active Unique Mobile Users

3.65B

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

The more natural and convenient input of the devices evolves towards speech.

Page 6: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

6

Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions.

Spoken dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in-car navigating system, etc).

Good SDSs assist users to organize and access information conveniently.

Spoken Dialogue System (SDS)

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion

Page 7: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

7

ASR: Automatic Speech Recognition SLU: Spoken Language Understanding DM: Dialogue Management NLG: Natural Language Generation

SDS Architecture

DomainDMASR SLU

NLG

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

current bottleneck

Page 8: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

8

Knowledge Representation/Ontology Traditional SDSs require manual annotations for specific domains to represent domain knowledge.

Restaurant Domain

Movie Domain

restaurant

typeprice

location

movie

yeargenre

director

Node: semantic concept/slotEdge: relation between concepts

located_in

directed_by

released_in

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 9: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

9

Utterance Semantic Representation An SLU model requires a domain ontology to decode utterances into semantic forms, which contain core content (a set of slots and slot-fillers) of the utterance.

find a cheap taiwanese restaurant in oakland

show me action movies directed by james cameron

target=“restaurant”, price=“cheap”, type=“taiwanese”, location=“oakland”

target=“movie”, genre=“action”, director=“james cameron”

Restaurant Domain

Movie Domain

restaurant

typeprice

location

movie

yeargenre

director

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 10: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

10

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost, long duration and poor scalability of system development.

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests.

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

seeking=“find”target=“eating place”price=“cheap”food=“asian food”

find a cheap eating place for asian food

fully unsupervised

Prior Focus

Page 11: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

11

Questions to Address1) Given unlabelled conversations, how can a system automatically induce

and organize domain-specific concepts?

2) With the automatically acquired knowledge, how can a system understand utterance semantics and user intents?

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 12: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

12

Interaction Examplefind a cheap eating place for asian foodUser

Intelligent Agent Q: How does a dialogue system process this request?

Cheap Asian eating places include Rose Tea Cafe, Little Asia, etc. What do you want to choose? I can help you go there. (navigation)

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 13: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

13

Process Pipeline

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant { restaurant.price=“cheap” restaurant.food=“asian food”}

Predicted intent: navigation

find a cheap eating place for asian foodUser

Required Domain-Specific Information

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 14: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

14

Thesis Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant { restaurant.price=“cheap” restaurant.food=“asian food”}

Predicted intent: navigation

find a cheap eating place for asian foodUser

Ontology Induction (semantic slot)

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 15: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

15

Thesis Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant { restaurant.price=“cheap” restaurant.food=“asian food”}

Predicted intent: navigation

find a cheap eating place for asian foodUser

Ontology Induction

Structure Learning(inter-slot relation)

(semantic slot)

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 16: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

16

Thesis Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant { restaurant.price=“cheap” restaurant.food=“asian food”}

Predicted intent: navigation

find a cheap eating place for asian foodUser

Ontology Induction

Structure Learning

Surface Form Derivation(natural language)

(inter-slot relation)

(semantic slot)

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 17: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

17

Thesis Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant { restaurant.price=“cheap” restaurant.food=“asian food”}

Predicted intent: navigation

find a cheap eating place for asian foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

(natural language)

(inter-slot relation)

(semantic slot)

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 18: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

18

Thesis Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant { restaurant.price=“cheap” restaurant.food=“asian food”}

Predicted intent: navigation

find a cheap eating place for asian foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 19: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

19

Thesis Contributionsfind a cheap eating place for asian foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 20: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

20

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

Thesis Contributionsfind a cheap eating place for asian foodUser

Knowledge Acquisition SLU Modeling

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 21: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

21

Knowledge Acquisition1) Given unlabelled conversations, how can a system automatically

induce and organize domain-specific concepts?

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 22: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

22

SLU Modeling2) With the automatically acquired knowledge, how can a system

understand utterance semantics and user intents?

Organized Domain

Knowledge

price=“cheap” target=“restaurant”intent=navigation

SLU Modeling

SLU Component

“can i have a cheap restaurant”

SLU Modeling Semantic Decoding Intent Prediction

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 23: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

23

SDS Architecture – Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

current bottleneck

Page 24: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

24

Outline Introduction

Ontology Induction [ ASRU’13, SLT’14a]

Structure Learning [NAACL-HLT’15]

Surface Form Derivation [SLT’14b]

Semantic Decoding [ACL-IJCNLP’15]

Intent Prediction [SLT’14c, ICMI’15]

SLU in Human-Human Conversations [ASRU’15]

Conclusions & Future Work

Knowledge Acquisition

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

SLU Modeling

Page 25: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

25

Outline Introduction

Ontology Induction [ ASRU’13, SLT’14a]

Structure Learning [NAACL-HLT’15]

Surface Form Derivation [SLT’14b]

Semantic Decoding [ACL-IJCNLP’15]

Intent Prediction [SLT’14c, ICMI’15]

SLU in Human-Human Conversations [ASRU’15]

Conclusions & Future Work

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 26: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

26

Ontology Induction [ASRU’13, SLT’14a]

Input: Unlabelled user utterances

Output: Slots that are useful for a domain-specific SDS

Chen et al., "Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing," in Proc. of ASRU, 2013. (Best Student Paper Award)Chen et al., "Leveraging Frame Semantics and Distributional Semantics for Unsupervised Semantic Slot Induction in Spoken Dialogue Systems," in Proc. of SLT, 2014.

Restaurant Asking

Conversations

target

food price

seeking

quantity

Domain-Specific Slot

Unlabelled Collection

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Idea: select a subset of FrameNet-based slots for domain-specific SDS

Best Student Paper Award

Page 27: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

27

Step 1: Frame-Semantic Parsing (Das et al., 2014)

can i have a cheap restaurant

Frame: capability

Frame: expensiveness

Frame: locale by use

Task: differentiate domain-specific frames from generic frames for SDSs

Good!Good!

?

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Das et al., " Frame-semantic parsing," in Proc. of Computational Linguistics, 2014.

slot candidate

Page 28: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

28

Step 2: Slot Ranking Model Compute an importance score of a slot candidate s by

slot frequency in the domain-specific conversation

semantic coherence of slot fillers

slots with higher frequency more important

domain-specific concepts fewer topics

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

measured by cosine similarity between their word embeddings

Page 29: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

29

Step 3: Slot Selection Rank all slot candidates by their importance scores

Output slot candidates with higher scores based on a threshold

frequency semantic coherence

locale_by_use 0.89capability 0.68food 0.64expensiveness 0.49quantity 0.30seeking 0.22

:

locale_by_use

foodexpensiveness

capability

quantity

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 30: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

30

Experiments of Ontology Induction Dataset

◦ Cambridge University SLU corpus [Henderson, 2012]◦ Restaurant recommendation (WER = 37%)

◦ 2,166 dialogues◦ 15,453 utterances◦ dialogue slot: addr, area, food, name, phone, postcode, price range, task, type

The mapping table between induced and reference slots

Henderson et al., "Discriminative spoken language understanding using word confusion networks," in Proc. of SLT, 2012.

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 31: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

31

Experiments of Ontology InductionExperiment: Slot Induction◦ Metric: Average Precision (AP) and Area Under the Precision-Recall

Curve (AUC) of the slot ranking model to measure quality of induced slots via the mapping table

Semantic relations help decide domain-specific knowledge.

Approach Word EmbeddingASR Transcripts

AP (%) AUC (%) AP (%) AUC (%)Baseline: MLE 58.2 56.2 55.0 53.5

Proposed:+ Coherence

In-Domain Word Vec. 67.0 65.8 58.0 56.5

External Word Vec. 74.5 (+39.9%)

73.5 (+44.1%)

65.0 (+18.1%)

64.2 (+19.9%)

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 32: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

32

Experiments of Ontology InductionSensitivity to Amount of Training Data◦ Different amount of training transcripts for ontology induction

Most approaches are not sensitive to training data size due to single-domain dialogues.

The external word vectors trained on larger data perform better.

0.2 0.4 0.6 0.8 1.045

55

65

Frequency

In-Domain Word Vector

External Word Vector

AvgP (%)

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Ratio of Training Data

Page 33: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

33

Outline Introduction

Ontology Induction [ ASRU’13, SLT’14a]

Structure Learning [NAACL-HLT’15]

Surface Form Derivation [SLT’14b]

Semantic Decoding [ACL-IJCNLP’15]

Intent Prediction [SLT’14c, ICMI’15]

SLU in Human-Human Conversations [ASRU’15]

Conclusions & Future Work

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 34: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

34

Structure Learning [NAACL-HLT’15]

Input: Unlabelled user utterances

Output: Slots with relations

Chen et al., "Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding," in Proc. of NAACL-HLT, 2015.

Restaurant Asking

Conversations

Domain-Specific Ontology

Unlabelled Collection

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Idea: construct a knowledge graph and then compute slot importance based on relations

Page 35: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

35

Step 1: Knowledge Graph Construction

ccomp

amoddobjnsubj det

Syntactic dependency parsing on utterances

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

can i have a cheap restaurantcapability expensiveness locale_by_use

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 36: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

36

Step 2: Edge Weight Measurement Compute edge weights to represent relation importance

◦ Slot-to-slot relation : similarity between slot embeddings◦ Word-to-slot relation or : frequency of the slot-word pair◦ Word-to-word relation : similarity between word embeddings

𝐿𝑤𝑤

𝐿𝑤𝑠𝑜𝑟 𝐿𝑠𝑤

𝐿𝑠𝑠

w1

w2

w3

w4

w5

w6

w7

s2s1 s3

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 37: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

37

Step 2: Slot Importance by Random Walk

scores propagated from word-layer then propagated within slot-layer

Assumption: the slots with more dependencies to more important slots should be more important

The random walk algorithm computes importance for each slot

original frequency score

slot importance

𝐿𝑤𝑤

𝐿𝑤𝑠𝑜𝑟 𝐿𝑠𝑤

𝐿𝑠𝑠

w1

w2

w3

w4

w5

w6

w7

s2s1 s3

Converged scores can represent the importance.

https://github.com/yvchen/MRRW

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

locale_by_use

food expensiveness

seeking

relational_quantity

desiring

Page 38: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

38

The converged slot importance suggests whether the slot is important

Rank slot pairs by summing up their converged slot importance

Select slot pairs with higher scores according to a threshold

Step 3: Identify Domain Slots w/ Relations

s1 s2 rs(1) + rs(2)

s3 s4 rs(3) + rs(4)

::

::

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

(Experiment 1)

(Experiment 2)

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 39: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

39

Experiments for Structure LearningExperiment 1: Quality of Slot Importance

Dataset: Cambridge University SLU Corpus

ApproachASR Transcripts

AP (%) AUC (%) AP (%) AUC (%)Baseline: MLE 56.7 54.7 53.0 50.8

Proposed:Random Walk via Dependencies

71.5 (+26.1%)

70.8 (+29.4%)

76.4 (+44.2%)

76.0 (+49.6%)

Dependency relations help decide domain-specific knowledge.

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 40: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

40

Experiments for Structure LearningExperiment 2: Relation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 41: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

41

Experiments for Structure LearningExperiment 2: Relation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_INThe automatically learned domain ontology aligns well with the reference one.

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 42: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

42

Outline Introduction

Ontology Induction [ ASRU’13, SLT’14a]

Structure Learning [NAACL-HLT’15]

Surface Form Derivation [SLT’14b]

Semantic Decoding [ACL-IJCNLP’15]

Intent Prediction [SLT’14c, ICMI’15]

SLU in Human-Human Conversations [ASRU’15]

Conclusions & Future Work

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 43: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

43

Surface Form Derivation [SLT’14b]

Input: a domain-specific organized ontology

Output: surface forms corresponding to entities in the ontology

Domain-Specific Ontology

Chen et al., "Deriving Local Relational Surface Forms from Dependency-Based Entity Embeddings for Unsupervised Spoken Language Understanding," in Proc. of SLT, 2014.

movie

genrechar

directordirected_by

genrecasting

character, actor, role, … action, comedy, …

director, filmmaker, …

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Idea: mine patterns from the web to help understanding

Page 44: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

44

Step 1: Mining Query Snippets on Web Query snippets including entity pairs connected with specific relations in KG

Avatar is a 2009 American epic science fiction film directed by James Cameron.

directed_by

The Dark Knight is a 2008 superhero film directed and produced by Christopher Nolan.

directed_by

directed_byAvatar

James Cameron

directed_byThe Dark

KnightChristopher

Nolan

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 45: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

45

Step 2: Training Entity Embeddings Dependency parsing for training dependency-based embeddings

Avatar is a 2009 American epic science fiction film Camerondirected by James

nsubnumdetcop

nnvmod

prop_by

nn

$movie $directorprop pobjnnnnnn

$movie = is =

::

film =

Levy and Goldberg, " Dependency-Based Word Embeddings," in Proc. of ACL, 2014.

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 46: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

46

Step 3: Deriving Surface Forms Entity Surface Forms

◦ learn the surface forms corresponding to entities◦ most similar word vectors for each entity embedding

Entity Contexts◦ learn the important contexts of entities◦ most similar context vectors for each entity embedding

$char: “character”, “role”, “who”$director: “director”, “filmmaker”$genre: “action”, “fiction”

$char: “played”$director: “directed”

with similar contexts

frequently occurring together

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 47: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

47

Experiments of Surface Form Derivation Knowledge Base: Freebase

◦ 670K entities; 78 entity types (movie names, actors, etc)

Entity Tag Derived Word$character character, role, who, girl, she, he, officier$director director, dir, filmmaker$genre comedy, drama, fantasy, cartoon, horror, sci

$language language, spanish, english, german$producer producer, filmmaker, screenwriter

The web-derived surface forms provide useful knowledge for better understanding.

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 48: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

48

Integrated with Background Knowledge

Inference from GazetteersEntity Dict. PE (r | w)

Knowledge Graph Entity

Probabilistic Enrichment

Ru (r)

Final Results

“find me some films directed by james cameron”Input Utterance

Background Knowledge

Surface Form Derivation

Entity Embeddings

PF (r | w)

Entity Surface Forms

PC (r | w)

Entity Contexts

Local Relational Surface Form

Snippets

Knowledge Graph

Hakkani-Tür et al., “Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding,” in Proc. of Interspeech, 2014.

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 49: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

49

Approach Micro F-Measure (%)

Baseline: Gazetteer 38.9

Gazetteer + Entity Surface Form + Entity Context 43.3(+11.4%)

Experiments of Surface Form Derivation Relation Detection Data (NL-SPARQL): a dialog system challenge set for converting natural language to structured queries (Hakkani-Tür et al., 2014)

◦ Crowd-sourced utterances (3,338 for self-training, 1,084 for testing)

Metric: micro F-measure (%)

The web-derived knowledge can benefit SLU performance.

Hakkani-Tür et al., “Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding,” in Proc. of Interspeech, 2014.

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 50: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

50

Outline Introduction

Ontology Induction [ ASRU’13, SLT’14a]

Structure Learning [NAACL-HLT’15]

Surface Form Derivation [SLT’14b]

Semantic Decoding [ACL-IJCNLP’15]

Intent Prediction [SLT’14c, ICMI’15]

SLU in Human-Human Conversations [ASRU’15]

Conclusions & Future Work

SLU Modeling

Knowledge Acquisition

Restaurant Asking

Conversations

Organized Domain Knowledge

Unlabelled Collection

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 51: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

51

Outline Introduction

Ontology Induction [ ASRU’13, SLT’14a]

Structure Learning [NAACL-HLT’15]

Surface Form Derivation [SLT’14b]

Semantic Decoding [ACL-IJCNLP’15]

Intent Prediction [SLT’14c, ICMI’15]

SLU in Human-Human Conversations [ASRU’15]

Conclusions & Future Work

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 52: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

52

Semantic Decoding [ACL-IJCNLP’15]

Input: user utterances, automatically acquired knowledge

Output: the semantic concepts included in each individual utterance

Chen et al., "Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding," in Proc. of ACL-IJCNLP, 2015.

SLU Model

target=“restaurant”price=“cheap”

“can I have a cheap restaurant”Frame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

×

Semantic KG

MF-SLU: SLU Modeling by Matrix Factorization

Semantic Representation

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Idea: utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

Page 53: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

53

Ontology Induction

SLUFw Fs

Structure Learning

×

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 .97.90 .95.85

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

Matrix Factorization SLU (MF-SLU)Feature Model

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

MF completes a partially-missing matrix based on a low-rank latent semantics assumption.

Page 54: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

54

Matrix Factorization (Rendle et al., 2009)

The decomposed matrices represent latent semantics for utterances and words/slots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

.97.90 .95.85

.93 .92.98.05 .05

|𝑼|

|𝑾|+|𝑺|

≈|𝑼|×𝒅 𝒅× (|𝑾|+|𝑺|)×

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Rendle et al., “BPR: Bayesian Personalized Ranking from Implicit Feedback," in Proc. of UAI, 2009.

Page 55: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

55

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

×

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

.97.90 .95.85

.93 .92.98.05 .05

Slot Induction

Matrix Factorization SLU (MF-SLU)Feature Model + Knowledge Graph Propagation Model

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Structure information is integrated to make the self-training data more reliable before MF.

Page 56: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

56

Experiments of Semantic DecodingExperiment 1: Quality of Semantics Estimation

Dataset: Cambridge University SLU Corpus

Metric: MAP of all estimated slot probabilities for each utterance

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Approach ASR TranscriptsBaseline:

SLUSupport Vector Machine 32.5 36.6

Multinomial Logistic Regression 34.0 38.8

Page 57: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

57

Experiments of Semantic DecodingExperiment 1: Quality of Semantics Estimation

Dataset: Cambridge University SLU Corpus

Metric: MAP of all estimated slot probabilities for each utterance

The MF-SLU effectively models implicit information to decode semantics.

The structure information further improves the results.

Approach ASR Transcripts

Baseline: SLU

Support Vector Machine 32.5 36.6Multinomial Logistic Regression 34.0 38.8

Proposed: MF-SLU

Feature Model 37.6* 45.3*

Feature Model +Knowledge Graph Propagation

43.5*

(+27.9%)53.4*

(+37.6%)

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

*: the result is significantly better than the MLR with p < 0.05 in t-test

Page 58: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

58

Experiments of Semantic DecodingExperiment 2: Effectiveness of Relations

Dataset: Cambridge University SLU Corpus

Metric: MAP of all estimated slot probabilities for each utterance

In the integrated structure information, both semantic and dependency relations are useful for understanding.

Approach ASR Transcripts

Feature Model 37.6 45.3

Feature + Knowledge Graph Propagation

Semantic 41.4* 51.6*

Dependency 41.6* 49.0*

All 43.5* (+15.7%) 53.4* (+17.9%)

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

*: the result is significantly better than the MLR with p < 0.05 in t-test

Page 59: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=“cheap” target=“restaurant”

SLU Model

“can i have a cheap restaurant”

intent=navigation

restaurant=“legume” time=“tonight”

SLU Model

“i plan to dine in legume tonight”

intent=reservation

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 60: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

60

Outline Introduction

Ontology Induction [ ASRU’13, SLT’14a]

Structure Learning [NAACL-HLT’15]

Surface Form Derivation [SLT’14b]

Semantic Decoding [ACL-IJCNLP’15]

Intent Prediction [SLT’14c, ICMI’15]

SLU in Human-Human Conversations [ASRU’15]

Conclusions & Future Work

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 61: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

61

[Chen & Rudnicky, SLT 2014; Chen et al., ICMI 2015]

Input: spoken utterances for making requests about launching an app

Output: the apps supporting the required functionality

Intent Identification◦ popular domains in Google Play

please dial a phone call to alex

Skype, Hangout, etc.

Intent Prediction of Mobile Apps [SLT’14c]

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Chen and Rudnicky, "Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings," in Proc. of SLT, 2014.

Page 62: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

62

Input: single-turn request

Output: the apps that are able to support the required functionality

Intent Prediction – Single-Turn Request

1

Enriched Semantics

communication

.90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

… …

contact message Gmail Outlook Skypeemail

Test

.90

Reasoning with Feature-Enriched MF

Train

… your email, calendar, contacts…

… check and send emails, msgs …

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 .90 .85 .97 .95

FeatureEnrichment

Utterance 1 i would like to contact alex…

1

1

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents.

Page 63: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

63

Intent Prediction – Multi-Turn Interaction [ICMI’15]

Input: multi-turn interaction

Output: the app the user plans to launch

Challenge: language ambiguity1) User preference2) App-level contexts

Chen et al., "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, 2015. Data Available at http://AppDialogue.com/.

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

send to vivianv.s.

Email? Message?Communication

Idea: Behavioral patterns in history can help intent prediction.

previous turn

Page 64: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

64

Intent Prediction – Multi-Turn Interaction [ICMI’15]

Input: multi-turn interaction

Output: the app the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

CHROME

EMAIL

send

Behavior History

null camera

.85

take a photo of thissend it to alice

CAMERA

IM

email

1

1

1 1

1

1 .70

chrome

1

1

1

1

1

1

chrome email

11

1

1

.95

.80 .55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alice…

Chen et al., "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, 2015. Data Available at http://AppDialogue.com/.

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction.

Page 65: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

65

Single-Turn Request: Mean Average Precision (MAP)

Multi-Turn Interaction: Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 25.1 26.1

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 52.1 55.5

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 66: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

66

Single-Turn Request: Mean Average Precision (MAP)

Multi-Turn Interaction: Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 25.1 29.2 (+16.2%) 26.1 30.4 (+16.4%)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 52.1 52.7 (+1.2%) 55.5 55.4 (-0.2%)

Modeling hidden semantics helps intent prediction especially for noisy data.

Experiments for Intent Prediction

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 67: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

67

Single-Turn Request: Mean Average Precision (MAP)

Multi-Turn Interaction: Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 25.1 29.2 (+16.2%) 26.1 30.4 (+16.4%)Word + Embedding-Based Semantics 32.0 33.3Word + Type-Embedding-Based Semantics 31.5 32.9

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 52.1 52.7 (+1.2%) 55.5 55.4 (-0.2%)Word + Behavioral Patterns 53.9 56.6

Semantic enrichment provides rich cues to improve performance.

Experiments for Intent Prediction

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 68: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

68

Single-Turn Request: Mean Average Precision (MAP)

Multi-Turn Interaction: Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 25.1 29.2 (+16.2%) 26.1 30.4 (+16.4%)Word + Embedding-Based Semantics 32.0 34.2 (+6.8%) 33.3 33.3 (-0.2%)Word + Type-Embedding-Based Semantics 31.5 32.2 (+2.1%) 32.9 34.0 (+3.4%)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 52.1 52.7 (+1.2%) 55.5 55.4 (-0.2%)Word + Behavioral Patterns 53.9 55.7 (+3.3%) 56.6 57.7 (+1.9%)

Intent prediction can benefit from both hidden information and low-level semantics.

Experiments for Intent Prediction

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 69: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

69

Outline Introduction

Ontology Induction [ ASRU’13, SLT’14a]

Structure Learning [NAACL-HLT’15]

Surface Form Derivation [SLT’14b]

Semantic Decoding [ACL-IJCNLP’15]

Intent Prediction [SLT’14c, ICMI’15]

SLU in Human-Human Conversations [ASRU’15]

Conclusions & Future Work

SLU Modeling

Organized Domain Knowledge

price=“cheap” target=“restaurant”intent=navigation

SLU Model

“can i have a cheap restaurant”

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 70: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

70

Outline Introduction

Ontology Induction [ ASRU’13, SLT’14a]

Structure Learning [NAACL-HLT’15]

Surface Form Derivation [SLT’14b]

Semantic Decoding [ACL-IJCNLP’15]

Intent Prediction [SLT’14c, ICMI’15]

SLU in Human-Human Conversations [ASRU’15]

Conclusions & Future Work

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 71: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

71

SLU in Human-Human Dialogues Computing devices have been easily accessible during regular human-human conversations.

The dialogues include discussions for identifying speakers’ next actions.

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Is it possible to apply the techniques developed for human-machine dialogues to human-human dialogues?

Page 72: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

72

Will Vivian come here for the meeting?

Find Calendar Entry

Actionable Item Utterances

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 73: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

73

Goal: provide actions an existing system can handle w/o interrupting conversations

Assumption: some actions and associated arguments can be shared across genres

Human-Machine Genre create_calendar_entryschedule a meeting with John this afternoon

Human-Human Genre create_calendar_entryhow about the three of us discuss this later this afternoon ?

more casual, include conversational terms

Actionable Item Detection [ASRU’15]

Task: multi-class utterance classification• train on the available human-machine genre• test on the human-human genre

Adaptation• model adapation• embedding vector adaptation

Chen et al., "Detecting Actionable Items in Meetings by Convolutional Deep Structred Semantic Models," in Proc. of ASRU, 2015.

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

contact_name start_time

contact_name start_time

Page 74: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

74

Action Item Detection Idea: human-machine interactions may help detect actionable items in human-human conversations

12 30 on Monday the 2nd of July schedule meeting to discuss taxes with Bill, Judy and Rick.12 PM lunch with Jessie12:00 meeting with cleaning staff to discuss progress2 hour business presentation with Rick Sandy on March 8 at 7am

create_calendar_entry

A read receipt should be sent in Karen's email.Activate meeting delay by 15 minutes. Inform the participants.Add more message saying "Thanks".

send_email

Query Doc

Convolutional deep structured semantic models for IR may be useful for this task.

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 75: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

75

20K 20K 20K

1000

w1 w2 w3

1000 1000

20K

wd

1000

300

Word Sequence: xWord Hashing Matrix: Wh

Word Hashing Layer: lh

Convolution Matrix: Wc

Convolutional Layer: lc

Max Pooling Operation

Max Pooling Layer: lm

Semantic Projection Matrix: Ws

Semantic Layer: y

max max max300 300 300 300

U A1 A2 An

CosSim(U, Ai)

P(A1 | U) P(A2 | U) P(An | U)

…Utterance

Action

how about we discuss this later

(Huang et al., 2013; Shen et al., 2014)

Convolutional Deep Structured Semantic Models (CDSSM)

Huang et al., "Learning deep structured semantic models for web search using clickthrough data," in Proc. of CIKM, 2013.Shen et al., “Learning semantic representations using ´ convolutional neural networks for web search," in Proc. of WWW, 2014.

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

maximizes the likelihood of associated actions given utterances

Page 76: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

76

Adaptation Issue: mismatch between a source genre and a target genre

Solution:◦ Adapting CDSSM

◦ Continually train the CDSSM using the data from the target genre◦ Adapting Action Embeddings

◦ Moving learned action embeddings close to the observed corresponding utterance embeddings

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 77: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

77

Adapting Action Embeddings The mismatch may result in inaccurate action embeddings

Action1

Action2

Action3

utt utt

uttutt

utt

utt

utt

• Action embedding: trained on the source genre

• Utterance embedding: generated on the target genre

Idea: moving action embeddings close to the observed utterance embeddings from the target genre

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 78: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

78

Adapting Action Embeddings Learning adapted action embeddings by minimizing an objective:

Action1

Action2

Action3

utt utt

uttutt

utt

utt

utt

Action3Action1

Action2

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

the distance between original & new action embeddings

the distance between new action vec & corresponding utterance vecs

The actionable scores can be measured by the similarity between utterance embeddings and adapted action embeddings.

Page 79: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

79

Iterative Ontology Refinement Actionable information may help refine the induced domain ontology

◦ Intent: create_single_reminder◦ Higher score utterance with more core contents◦ Lower score less important utterance

weighted frequency semantic coherence

Ontology Induction

Actionable Item Detection

SLU Modeling

The iterative framework can benefit understanding in both human-machine and human-human conversations.

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Utt1 0.7 Receiving Commerce_payUtt2 0.2 Capability

Unlabeled Collection

Page 80: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

80

Experiments of H-H SLUExperiment 1: Actionable Item Detection Dataset: 22 meetings from the ICSI meeting corpus (3 types of meeting: Bed, Bmr, Bro)

Identified action: find_calendar_entry, create_calendar_entry, open_agenda, add_agenda_item, create_single_reminder, send_email, find_email, make_call, search, open_setting

Annotating agreement: Cohen’s Kappa = 0.64~0.67

Bed

Bmr

Bro

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

create_single_reminder find_calendar_entry search add_agenda_item create_calendar_entry open_agenda send_emailfind_email make_call open_setting

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Data & Model Available at http://research.microsoft.com/en-us/projects/meetingunderstanding/

Page 81: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

81

Metrics: the average AUC for 10 actions+others

Approach Mismatch-CDSSM Adapt-CDSSMOriginal Similarity 49.1 50.4Similarity based on Adapted Embeddings

55.8(+13.6%)

60.1(+19.2%)

Experiments of H-H SLUExperiment 1: Actionable Item Detection

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

trained on Cortana datathen continually trained on meeting data

model adaptationaction embedding adaptation

Two adaptation approaches are useful to overcome the genre mismatch.

Page 82: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

82

Experiments of H-H SLUExperiment 1: Actionable Item Detection

Model AUC (%)

BaselineN-gram (N=1,2,3) 52.84Paragraph Vector (doc2vec) 59.79

Proposed CDSSM Adapted Vector 69.27

Baselines◦ Lexical: ngram◦ Semantic: paragraph vector (Le and Mikolov, 2014)

Classifier: SVM with RBF

The CDSSM semantic features outperform lexical n-grams and paragraph vectors, where about 70% actionable items in meetings can be detected.

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Le and Mikolov, “Distributed Representations of Sentences and Documents," in Proc. of JMLR, 2014.

Page 83: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

83

Dataset: 155 conversations between customers and agents from the MetLife call center; 5,229 automatically segmented utterances (WER = 31.8%)

Annotating agreement of actionable utterances (customer intents or agent actions): Cohen’s Kappa = 0.76

Reference Ontology◦ Slot: frames selected by annotators◦ Structure: slot pairs with dependency relations

FrameNet Coverage: 79.5%◦ #additional important concepts: 8

◦ cancel, refund, delete, discount, benefit, person, status, care◦ #reference slots: 31

Experiments of H-H SLUExperiment 2: Iterative Ontology Refinement

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 84: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

84

Metric: AUC for evaluating the ranking lists about slot and slot pairs

ApproachASR Transcripts

Slot Structure Slot StructureBaseline: MLE 43.4 11.4 59.5 25.9

Proposed: External Word Vec 49.6 12.8 64.7 40.2

The proposed ontology induction significantly improves the baseline in terms of slot and structure performance for multi-domain dialogues.

Experiments of H-H SLUExperiment 2: Iterative Ontology Refinement

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 85: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

85

Metric: AUC for evaluating the ranking lists about slot and slot pairs

ApproachASR Transcripts

Slot Structure Slot StructureBaseline: MLE 43.4 11.4 59.5 25.9

+ Actionable ScoreProposed

OracleProposed: External Word Vec 49.6 12.8 64.7 40.2

+ Actionable ScoreProposed

Oracle

ground truth of actionable utterances (upper bound)

Experiments of H-H SLUExperiment 2: Iterative Ontology Refinement

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

the estimation of actionable item detection

The proposed ontology induction significantly improves the baseline in terms of slot and structure performance for multi-domain dialogues.

Page 86: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

86

Metric: AUC for evaluating the ranking lists about slot and slot pairs

ApproachASR Transcripts

Slot Structure Slot StructureBaseline: MLE 43.4 11.4 59.5 25.9

+ Actionable ScoreProposed 42.9 11.3

Oracle 44.3 12.2Proposed: External Word Vec 49.6 12.8 64.7 40.2

+ Actionable ScoreProposed 49.2 12.8

Oracle 48.4 12.9

Actionable information does not significantly improve ASR results due to high WER.

Experiments of H-H SLUExperiment 2: Iterative Ontology Refinement

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 87: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

87

Metric: AUC for evaluating the ranking lists about slot and slot pairs

ApproachASR Transcripts

Slot Structure Slot StructureBaseline: MLE 43.4 11.4 59.5 25.9

+ Actionable ScoreProposed 42.9 11.3 59.8 26.6

Oracle 44.3 12.2 66.7 37.8Proposed: External Word Vec 49.6 12.8 64.7 40.2

+ Actionable ScoreProposed 49.2 12.8 65.0 40.5

Oracle 48.4 12.9 82.4 56.9

Actionable information significantly improves the performance for transcripts.

Experiments of H-H SLUExperiment 2: Iterative Ontology Refinement

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

The iterative ontology refinement is feasible, and it shows the potential room for improvement.

Page 88: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

88

Outline Introduction

Ontology Induction [ ASRU’13, SLT’14a]

Structure Learning [NAACL-HLT’15]

Surface Form Derivation [SLT’14b]

Semantic Decoding [ACL-IJCNLP’15]

Intent Prediction [SLT’14c, ICMI’15]

SLU in Human-Human Conversations [ASRU’15]

Conclusions & Future Work

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 89: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

89

Summary of Contributions

Ontology Induction Semantic relations are useful. Structure Learning Dependency relations are useful. Surface Form Derivation Web-derived surface forms benefit SLU.

Knowledge Acquisition

Semantic Decoding The MF-SLU decodes semantics. Intent Prediction The feature-enriched MF-SLU predicts intents.

SLU Modeling

CDSSM learns intent embeddings to detect actionable utterances, which may help ontology refinement as an iterative framework.

SLU in Human-Human Conversations

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 90: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

90

Conclusions The dissertation shows the feasibility and the potential for improving generalization, maintenance, efficiency, and scalability of SDSs, where the proposed techniques work for both human-machine and human-human conversations.

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies.

The proposed MF-SLU unifies the automatically acquired knowledge, and then allows systems to consider implicit semantics for better understanding.

◦ Better semantic representations for individual utterances◦ Better high-level intent prediction about follow-up behaviors

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Page 91: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

91

Future Work Apply the proposed technology to domain discovery

◦ not covered by the current systems but users are interested in◦ guide the next developed domains

Improve the proposed approach by handling the uncertainty

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

SLUSLUModelingASR Knowledge

Acquisition

Active Learning for SLU- w/o labels: data selection, filter uncertain instance- w/ explicit labels: crowd-sourcing- w/ implicit labels: successful interactions implies

the pseudo labels

Topic Prediction for ASR Improvement- Lexicon expansion with potential OOVs- LM adaptation- Lattice rescoring

recognition errors

unreliable knowledge

Page 92: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

92

Take Home Message Big Data without annotations is available

Main challenge: how to acquire and organize important knowledge, and further utilize it for applications

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Unsupervised or weakly-supervised methods will be the future trend!

Page 93: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

93

Q & ATHANKS FOR YOUR ATTENTIONS!!

THA NKS TO MY COMMI T TEE MEMB ERS FOR THEI R HELPFUL FEEDBAC K.

UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS