42
Applied Deep Learning Yun-Nung (Vivian) Chen Deep Learning for Dialogue Modeling YUN-NUNG (VIVIAN) CHEN 陳陳陳 VIVIANCHEN.IDV.TW Mar 29 th , 2017 @NTHU

Deep Learning for Dialogue Modeling - NTHU

Embed Size (px)

Citation preview

Page 1: Deep Learning for Dialogue Modeling - NTHU

Applied Deep LearningYun-Nung (Vivian) Chen

Deep Learning for Dialogue ModelingYUN-NUNG (VIVIAN) CHEN 陳縕儂VIVIANCHEN.IDV.TW

Mar 29th, 2017

@NTHU

Page 2: Deep Learning for Dialogue Modeling - NTHU

2

Outline

IntroductionFramework

Natural/Spoken Language Understanding

Dialogue Management

Output Generation

ChallengeRecent TrendConclusion

Page 3: Deep Learning for Dialogue Modeling - NTHU

3

Outline

IntroductionFramework

Natural/Spoken Language Understanding

Dialogue Management

Output Generation

ChallengeRecent TrendConclusion

Page 4: Deep Learning for Dialogue Modeling - NTHU

4

Apple Siri (2011)

Google Now (2012)

Facebook M & Bot (2015)

Intelligent Assistants

Google Home (2016)

Microsoft Cortana (2014)

Amazon Alexa/Echo (2014)

Page 5: Deep Learning for Dialogue Modeling - NTHU

5

Why We Need?

Daily Life Usage◦ Weather◦ Schedule◦ Transportation◦ Restaurant Seeking

Page 6: Deep Learning for Dialogue Modeling - NTHU

6

Why We Need?– Get things done

• E.g. set up alarm/reminder, take note

– Easy access to structured data, services and apps• E.g. find docs/photos/restaurants

– Assist your daily schedule and routine• E.g. commute alerts to/from work

– Be more productive in managing your work and personal life

Page 7: Deep Learning for Dialogue Modeling - NTHU

7

Why Natural Language?• Global Digital Statistics (2015 January)

Global Population

7.21B

Active Internet Users

3.01B

Active Social Media Accounts

2.08B

Active Unique Mobile Users

3.65B

The more natural and convenient input of devices evolves towards speech.

Page 8: Deep Learning for Dialogue Modeling - NTHU

8

Intelligent Assistant Architecture

Reactive Assistance

ASR, LU, Dialog, LG, TTS

Proactive Assistance

Inferences, User Modeling, Suggestions

Data Back-end Data

Bases, Services and Client Signals

Device/Service End-points(Phone, PC, Xbox, Web Browser, Messaging Apps)

User Experience“restaurant suggestions”“call taxi”

Page 9: Deep Learning for Dialogue Modeling - NTHU

9

• Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions.

• Spoken dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in-car navigating system, etc).

Spoken Dialogue System (SDS)

JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion

Good dialogue systems assist users to access information conveniently and finish tasks efficiently.

Page 10: Deep Learning for Dialogue Modeling - NTHU

10

APP BOT

Seamless and automatic information transferring across domains reduce duplicate information and interaction

• A bot is responsible for a “single” domain, similar to an app

愛食記地圖LINE

Goal: Schedule a lunch with Vivian

KKBOX

Page 11: Deep Learning for Dialogue Modeling - NTHU

11

Outline

IntroductionFramework

Natural/Spoken Language Understanding

Dialogue Management

Output Generation

ChallengeRecent TrendConclusion

Page 12: Deep Learning for Dialogue Modeling - NTHU

12

System Framework

Speech Recognition

Language Understanding (LU)• Domain Identification• User Intent Detection• Slot Filling

Dialogue Management (DM)• Dialogue State Tracking• System Action/Policy

Decision

Output Generation

Hypothesisare there any action movies to see this weekend

Semantic Framerequest_moviegenre=action, date=this weekend

System Action/Policyrequest_location

Text responseWhere are you located?

Screen Displaylocation?

Text InputAre there any action movies to see this weekend?

Speech Signal current bottleneck error propagation

Page 13: Deep Learning for Dialogue Modeling - NTHU

13

Interaction Example

User

Intelligent Agent Q: How does a dialogue system process this request?

Good Taiwanese eating places include Din Tai Fung, Boiling Point, etc. What do you want to choose? I can help you go there.

find a good eating place for taiwanese food

Page 14: Deep Learning for Dialogue Modeling - NTHU

14

System Framework

Speech Recognition

Language Understanding (LU)• Domain Identification• User Intent Detection• Slot Filling

Dialogue Management (DM)• Dialogue State Tracking• System Action/Policy

Decision

Output Generation

Hypothesisare there any action movies to see this weekend

Semantic Framerequest_moviegenre=action, date=this weekend

System Action/Policyrequest_location

Text responseWhere are you located?

Screen Displaylocation?

Text InputAre there any action movies to see this weekend?

Speech Signal

Page 15: Deep Learning for Dialogue Modeling - NTHU

15

1. Domain IdentificationRequires Predefined Domain Ontology

find a good eating place for taiwanese foodUser

Organized Domain Knowledge (Database)Intelligent Agent

Restaurant DB Taxi DB Movie DB

Classification!

Page 16: Deep Learning for Dialogue Modeling - NTHU

16

2. Intent DetectionRequires Predefined Schema

find a good eating place for taiwanese foodUser

Intelligent Agent

Restaurant DB

FIND_RESTAURANTFIND_PRICEFIND_TYPE

:

Classification!

Page 17: Deep Learning for Dialogue Modeling - NTHU

17

3. Slot FillingRequires Predefined Schema

find a good eating place for taiwanese foodUser

Intelligent Agent

Restaurant DB

Restaurant Rating TypeRest 1 good TaiwaneseRest 2 bad Thai

: : :

FIND_RESTAURANTrating=“good”type=“taiwanese”

SELECT restaurant { rest.rating=“good” rest.type=“taiwanese”}Semantic Frame Sequence Labeling

O O B-rating O O O B-type O

Page 18: Deep Learning for Dialogue Modeling - NTHU

18

System Framework

Speech Recognition

Language Understanding (LU)• Domain Identification• User Intent Detection• Slot Filling

Dialogue Management (DM)• Dialogue State Tracking• System Action/Policy

Decision

Output Generation

Hypothesisare there any action movies to see this weekend

Semantic Framerequest_moviegenre=action, date=this weekend

System Action/Policyrequest_location

Text responseWhere are you located?

Screen Displaylocation?

Text InputAre there any action movies to see this weekend?

Speech Signal

Page 19: Deep Learning for Dialogue Modeling - NTHU

19

State TrackingRequires Hand-Crafted States

User

Intelligent Agent

find a good eating place for taiwanese food

location rating type

loc, rating rating, type loc, type

all

i want it near to my office

NULL

Page 20: Deep Learning for Dialogue Modeling - NTHU

20

State TrackingRequires Hand-Crafted States

User

Intelligent Agent

find a good eating place for taiwanese food

location rating type

loc, rating rating, type loc, type

all

i want it near to my office

NULL

Page 21: Deep Learning for Dialogue Modeling - NTHU

21

State TrackingHandling Errors and Confidence

User

Intelligent Agent

find a good eating place for taixxxx food

FIND_RESTAURANTrating=“good”type=“taiwanese”

FIND_RESTAURANTrating=“good”type=“thai”

FIND_RESTAURANTrating=“good”

location rating type

loc, rating rating, type loc, type

all

NULL

?

?

Page 22: Deep Learning for Dialogue Modeling - NTHU

22

Policy for Agent Action• Inform

– “The nearest one is at Taipei 101”

• Request– “Where is your home?”

• Confirm– “Did you want Taiwanese food?”

• Database Search

• Task Completion / Information Display– ticket booked, weather information

Din Tai Fung::

Page 23: Deep Learning for Dialogue Modeling - NTHU

23

System Framework

Semantic Framerequest_moviegenre=action, date=this weekend

Speech Recognition

Language Understanding (LU)• Domain Identification• User Intent Detection• Slot Filling

Dialogue Management (DM)• Dialogue State Tracking• System Action/Policy

Decision

Hypothesisare there any action movies to see this weekend

Text InputAre there any action movies to see this weekend?

Speech Signal

Output Generation

System Action/Policyrequest_location

Text responseWhere are you located?

Screen Displaylocation?

Page 24: Deep Learning for Dialogue Modeling - NTHU

24

Output / NL Generation• Inform

– “The nearest one is at Taipei 101” v.s.

• Request– “Where is your home?” v.s.

• Confirm– “Did you want Taiwanese food?”

Page 25: Deep Learning for Dialogue Modeling - NTHU

25

Outline

IntroductionFramework

Natural/Spoken Language Understanding

Dialogue Management

Output Generation

ChallengeRecent TrendConclusion

Page 26: Deep Learning for Dialogue Modeling - NTHU

Challenge• Predefined semantic schemaChen et al., “Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding,” in ACL-IJCNLP, 2015.

• Data without annotationsChen et al., “Zero-Shot Learning of Intent Embeddings for Expansion by Convolutional Deep Structured Semantic Models,” in ICASSP, 2016.

• Semantic concept interpretationChen et al., “Deriving Local Relational Surface Forms from Dependency-Based Entity Embeddings for Unsupervised Spoken Language Understanding,” in SLT, 2014.

• Predefined dialogue statesChen, et al., “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding,” in Interspeech, 2016.

• Error propagationHakkani-Tur et al., “Multi-Domain Joint Semantic Frame Parsing using Bi-directional RNN-LSTM,” in Interspeech, 2016.

• Cross-domain intention/bot hierarchySun et al., “An Intelligent Assistant for High-Level Task Understanding,” in IUI, 2016.Sun et al., “AppDialogue: Multi-App Dialogues for Intelligent Assistants,” in LREC, 2016.Chen et al., “Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding,” in ICMI, 2016.

• Cross-domain information transferringKim et al., “New Transfer Learning Techniques For Disparate Label Sets,” in ACL-IJCNLP, 2015.

FIND_RESTAURANTrating=“good” rating=5? 4?

HotelRest Flight

Travel

Trip Planning

26

Page 27: Deep Learning for Dialogue Modeling - NTHU

27

Outline

IntroductionFramework

Natural/Spoken Language Understanding

Dialogue Management

Output Generation

ChallengeRecent TrendConclusion

Page 28: Deep Learning for Dialogue Modeling - NTHU

A Single Neuron

z

1w

2w

Nw…

1x

2x

Nx

b

z z

zbias

y

zez

1

1

Sigmoid function

Activation function

1

w, b are the parameters of this neuron28

Page 29: Deep Learning for Dialogue Modeling - NTHU

A Single Neuron

z

1w

2w

Nw…

1x

2x

Nx

bbias

y

1

5.0"2" 5.0"2"

ynotyis

A single neuron can only handle binary classification29

MN RRf :

Page 30: Deep Learning for Dialogue Modeling - NTHU

A Layer of Neurons

• Handwriting digit classification MN RRf :

A layer of neurons can handle multiple possible output,and the result depends on the max one

1x

2x

Nx

1

1y

……“1” or not

“2” or not

“3” or not

2y

3y

10 neurons/10 classes

Which one is max?

Page 31: Deep Learning for Dialogue Modeling - NTHU

Deep Neural Network (DNN)

• Fully connected feedforward network

1x

2x

……

Layer 1

……

1y

2y

……

Layer 2…

…Layer L

……

……

……

Input Output

MyNx

vector x

vector y

Deep NN: multiple hidden layers

MN RRf :

Page 32: Deep Learning for Dialogue Modeling - NTHU

32

RNN for SLU• IOB Sequence Labeling for Slot Filling

• Intent Classification

𝑤0 𝑤1 𝑤2 𝑤𝑛

h0𝑓 h1

𝑓 h2𝑓 h𝑛

𝑓

h0𝑏 h1

𝑏 h2𝑏 h𝑛

𝑏

𝑦 0 𝑦 1 𝑦 2 𝑦 𝑛

(a) LSTM (b) LSTM-LA (c) bLSTM-LA

(d) Intent LSTM

intent

𝑤0 𝑤1 𝑤2 𝑤𝑛

h0 h1 h2 h𝑛

𝑦 0 𝑦 1 𝑦 2 𝑦 𝑛

𝑤0 𝑤1 𝑤2 𝑤𝑛

h0 h1 h2 h𝑛

𝑦 0 𝑦 1 𝑦 2 𝑦 𝑛

𝑤0 𝑤1 𝑤2 𝑤𝑛

h0 h1 h2 h𝑛

Page 33: Deep Learning for Dialogue Modeling - NTHU

33

RNN for SLU• Joint Multi-Domain Intent Prediction and Slot Filling

– Information can mutually enhanced

semantic frame sequence

ht-1 ht+1htW W W W

taiwanese

B-type

U

food

U

please

U

VO

VO

V

hT+1

EOS

U

FIND_RESTV

Slot Tagging Intent Prediction

Hakkani-Tur, et al., “Multi-Domain Joint Semantic Frame Parsing using Bi-directional RNN-LSTM,” in Interspeech, 2016.

Page 34: Deep Learning for Dialogue Modeling - NTHU

34

just sent email to bob about fishing this weekend

O O O OB-contact_name

OB-subject I-subject I-subject

U

S

I send_emailD communication

send_email(contact_name=“bob”, subject=“fishing this weekend”)

are we going to fish this weekend

U1

S2 send_email(message=“are we going to fish this weekend”)

send email to bob

U2

send_email(contact_name=“bob”)

B-messageI-message

I-message I-message I-messageI-message I-message

B-contact_nameS1

Single Turn

Multi-Turn

Domain Identification Intent Prediction Slot Filling

Contextual SLU (Chen et al., 2016)

Page 35: Deep Learning for Dialogue Modeling - NTHU

35

u

Knowledge Attention Distributionpi

mi

Memory Representation

Weighted Sum h

∑ Wkg

oKnowledge Encoding

Representationhistory utterances {xi}

current utterance

c

Inner Product

Sentence EncoderRNNin

x1 x2 xi…

Contextual Sentence Encoder

x1 x2 xi…

RNNmem

slot tagging sequence y

ht-1 ht

V V

W W W

wt-1 wt

yt-1 yt

U U

RNN Tagger

M M

Chen, et al., “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding,” in Interspeech, 2016.

1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding

Contextual SLU (Chen et al., 2016)

Idea: additionally incorporating contextual knowledge during slot tagging track dialogue states in a latent way

Page 36: Deep Learning for Dialogue Modeling - NTHU

36

E2E Supervised Dialogue System

Wen, et al., “A Network-based End-to-End Trainable Task-Oriented Dialogue System,” arXiv.:1604.04562v2.

0 0 0 … 0 1

Database Operator

Copy field

Database

Seven daysCurry PrinceN

irala

Royal StandardLittle Seuol

DB pointer

Can I have korean

Korean 0.7British 0.2French 0.1

Belief Tracker

Intent Network

Can I have <v.food>

Generation Network <v.name> serves great <v.food> .

Policy Network

zt

ptxt

MySQL query:“Select * where food=Korean”

qt

Page 37: Deep Learning for Dialogue Modeling - NTHU

37

InfoBot: E2E Dialogue System with Supervised & Reinforcement Learning

Dhingra, et al., “End-to-End Reinforcement Learning of Dialogue Agents for Information Access,” arXiv.:1609.00777v2.

Movie=?; Actor=Bill Murray; Release Year=1993

Find me the Bill Murray’s movie.

I think it came out in 1993.

When was it released?

Groundhog Day is a Bill Murray movie which came out in 1993.

KB-InfoBotUser

(Groundhog Day, actor, Bill Murray)(Groundhog Day, release year, 1993)(Australia, actor, Nicole Kidman)(Mad Max: Fury Road, release year, 2015)

Knowledge Base (head, relation, tail)

Idea: differentiable database for propagating the gradients

Page 38: Deep Learning for Dialogue Modeling - NTHU

TC-Bot: E2End Task-Completion Bot

38

wi

B-type

wi+1 wi+2

O O

EOS

<intent>

wi

B-type

wi+1 wi+2

O O

EOS

<intent>

Semantic Framerequest_moviegenre=action, date=this weekend

System Action / Policyrequest_location

User Dialogue ActionInform(location=San Francisco)

Time t-1

wi

<slot>

wi+1 wi+2

O O

EOS

<intent>

Language Understanding (LU)

Time t-2

Time t

Dialogue Management

(DM)

w0 w1 w2

Natural Language Generation (NLG)EOSUser

Goal

User Agenda Modeling

User SimulatorEnd-to-End Neural Dialogue System

Text InputAre there any action movies to see this weekend?

Idea: supervised learning for each component and reinforcement learning for end-to-end training the neural dialogue system

Li, et al., “End-to-End Task-Completion Neural Dialogue Systems,” arXiv.:1703.01008.

Page 39: Deep Learning for Dialogue Modeling - NTHU

39

RL TC-BotUser goal: Two tickets for “the witch” tomorrow 9:30 PM at regal meridian 16, Seattle.

RULE BASED SYSTEMUser: Can I get 2 tickets for the witch?Agent: Which theater do you want?User: Regal meridian 16Agent: What time would you like to see it?User: 9:30 pmAgent: What date would you like to see it?User: Tomorrow!Agent: How many people are you?User: Just two adults.Agent: What is your location?User: Seattle!Agent: What kind of video format do you like?User: Oh, any kind is okay. Agent: Do you have any price constraint?User: No, I don’t care.Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm.User: Thank you.Agent: Thank you!

REINFORCEMENT LEARNING SYSTEMUser: Can I get 2 tickets for the witch?Agent: What date would you like to see it?User: TomorrowAgent: Which theater do you prefer?User: Regal meridian 16Agent: What is your location?User: SeattleAgent: What time would you like to see it?User: 9:30 pm.Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm.User: Thanks.Agent: Thanks!

The system can learn how to efficiently interact with users for task completion

Page 40: Deep Learning for Dialogue Modeling - NTHU

40

Outline

IntroductionFramework

Natural/Spoken Language Understanding

Dialogue Management

Output Generation

ChallengeRecent TrendConclusion

Page 41: Deep Learning for Dialogue Modeling - NTHU

41

Conclusion• The conversational systems can manage information access

via spoken interactions• A domain is usually constrained by the backend service

– Semantic schema should be predefined– Cross-domain knowledge and intention is difficult to handled

• NN-Based Dialogue System– Pipeline outputs are represented as vectors distributional

• Semantic frames as vectors to encode confidence

• Implicitly represent dialogue states in hidden vectors

– The execution is constrained by backend services symbolic

Page 42: Deep Learning for Dialogue Modeling - NTHU

42

Q & AT H A N K S F O R Y O U R AT T E N T I O N !