37
Learning to Interpret Natural Language Instructions Monica Babeş-Vroman + , James MacGlashan * , Ruoyuan Gao + , Kevin Winner * Richard Adjogah * , Marie desJardins * , Michael Littman + and Smaranda Muresan ++ *Department of Computer Science and Electrical Engineering University of Maryland, Baltimore County +Computer Science Department, Rutgers University ++School of Communication and Information, Rutgers University NAACL-2012 WORKSHOP From Words to Actions: Semantic Interpretation in an Actionable Context

Learning to Interpret Natural Language Instructions

  • Upload
    magda

  • View
    43

  • Download
    0

Embed Size (px)

DESCRIPTION

Learning to Interpret Natural Language Instructions. Monica Babeş-Vroman + , James MacGlashan * , Ruoyuan Gao + , Kevin Winner * Richard Adjogah * , Marie desJardins * , Michael Littman + and Smaranda Muresan ++ *Department of Computer Science and Electrical Engineering - PowerPoint PPT Presentation

Citation preview

Page 1: Learning to Interpret Natural Language Instructions

Learning to Interpret Natural Language Instructions

Monica Babeş-Vroman+ , James MacGlashan*, Ruoyuan Gao+ , Kevin Winner*

Richard Adjogah* , Marie desJardins* , Michael Littman+ and Smaranda Muresan++

*Department of Computer Science and Electrical EngineeringUniversity of Maryland, Baltimore County

+Computer Science Department, Rutgers University++School of Communication and Information, Rutgers University

NAACL-2012 WORKSHOPFrom Words to Actions: Semantic Interpretation in an Actionable Context

Page 2: Learning to Interpret Natural Language Instructions

Outline

• Motivation and Problem• Our approach• Pilot study• Conclusions and future work

Page 3: Learning to Interpret Natural Language Instructions

Motivation

Bring me the red mug that I left in the conference room

Page 4: Learning to Interpret Natural Language Instructions

Motivation

Page 5: Learning to Interpret Natural Language Instructions

Motivation

Page 6: Learning to Interpret Natural Language Instructions

Train an artificial agent to learn to carry out complex multistep tasks specified in natural language.

Problem

Page 7: Learning to Interpret Natural Language Instructions

Another Example of A Task of pushing an object to a room. Ex : square and red room

Abstract task: move obj to color room

move square to red room move star to green room go to green room

Page 8: Learning to Interpret Natural Language Instructions

Outline

• What is the problem?• Our approach• Pilot study• Conclusions and future work

Page 9: Learning to Interpret Natural Language Instructions

Training Data:

S2

F1 F2 F3 F4 F5 F6

S3

S4

S1

Object-oriented Markov Decision Process (OO-MDP) [Diuk et al., 2008]

Page 10: Learning to Interpret Natural Language Instructions

“Push the star into the teal room”

Page 11: Learning to Interpret Natural Language Instructions

“Push the star into the teal room”

Task Learning from Demonstrations

Semantic Parsing

Task Abstraction

F2 F6

Our Approach

push(star1, room1)P1(room1, teal)

F1 F2 F3 F4 F5 F6

Page 12: Learning to Interpret Natural Language Instructions

“Push the star into the teal room”

Task Learning from Demonstrations

Semantic Parsing

Task Abstraction

F2 F6

Our Approach

“teal”push(star1, room1)P1(room1, teal) P1 color

“push“ objToRoom

Page 13: Learning to Interpret Natural Language Instructions

“Push the star into the teal room”

Task Learning from Demonstrations

Semantic Parsing

Task Abstraction

Our Approach

Page 14: Learning to Interpret Natural Language Instructions

π*

S1

S2 S3

S430%70%

0%100%

100%0%

0%0%

RL

Task Learning from Demonstration

= Inverse Reinforcement Learning (IRL)

• But first let’s see Reinforcement Learning Problem

S1

S2 S3

S4MDP (S, A, T, γ)

R

Page 15: Learning to Interpret Natural Language Instructions

πE

S1

S2 S3

S430%70%

0%100%

100%0%

0%0%

Task Learning From Demonstrations

• Inverse Reinforcement Learning

Page 16: Learning to Interpret Natural Language Instructions

πE

S1

S2 S3

S430%70%

0%100%

100%0%

0%0%

Task Learning From Demonstrations

• Inverse Reinforcement Learning

Page 17: Learning to Interpret Natural Language Instructions

πE

S1

S2 S3

S430%70%

0%100%

100%0%

0%0%

R

Task Learning From Demonstrations

• Inverse Reinforcement Learning

S1

S2 S3

S4MDP (S, A, T, γ)

IRL

Page 18: Learning to Interpret Natural Language Instructions

• Maximum Likelihood IRL[M. Babeş, V. Marivate, K. Subramanian and M. Littman 2011]

θ Pr( )R π

Boltzmann Distribution

update in the direction of ∇ Pr( )

φ

MLIRL

Assumption: Reward function is a linear combination of a known set of features. (e.g., )

Goal: Find θ to maximize the probability of the observed trajectories.

(s,a): (state, action) pairR: reward functionφ: feature vectorπ: policyPr( ): probability of demonstrations

),(),(),R( asasasi ii

Page 19: Learning to Interpret Natural Language Instructions

“Push the star into the teal room”

Task Learning from Demonstrations

Semantic Parsing

Task Abstraction

Our Approach

Page 20: Learning to Interpret Natural Language Instructions

Semantic model (ontology)

NP (w, ) → Adj (w1, ), NP (w2, ): Φi NP (w, ) → Det (w1, ), NP (w2, ): Φi

…Syntax + Semantics + Ontological Constraints

Grammar

PARSING

Semantic representationtext

Semantic Parsing• Lexicalized Well-Founded Grammars (LWFG)

[Muresan, 2006; 2008; 2012]

Page 21: Learning to Interpret Natural Language Instructions

Empty Semantic model

Grammar

PARSING

X1.is=teal, X.P1=X1, X.isa=roomteal room

Semantic Parsing

• Lexicalized Well-Founded Grammars (LWFG) [Muresan, 2006; 2008; 2012]

room1

teal

P1 uninstantiated variable

NP (w, ) → Adj (w1, ), NP (w2, ): Φi NP (w, ) → Det (w1, ), NP (w2, ): Φi

Page 22: Learning to Interpret Natural Language Instructions

Syntax/Semantics/Ontological Constraints

Grammar

PARSING

X1.is=green, X.color=X1, X.isa=room

teal room

Semantic Parsing

• Lexicalized Well-Founded Grammars (LWFG) [Muresan, 2006; 2008; 2012]

room1

color≅

Semantic model (ontology)

green

NP (w, ) → Adj (w1, ), NP (w2, ): Φi NP (w, ) → Det (w1, ), NP (w2, ): Φi

Page 23: Learning to Interpret Natural Language Instructions

Learning LWFG• [Muresan and Rambow 2007; Muresan, 2010; 2011]

Currently learning is done offline No assumption of existing semantic model

(loud noise, )

(the noise, )

Small Training Data

RELATIONAL LEARNING MODEL

LWFG

NP

NP

PP

Representative Examples

(loud noise)

(the noise)

(loud clear noise)

(the loud noise)

Generalization SetNP

NP

NP

NP

NP (w, ) → Adj (w1, ), NP (w2, ):Φi NP (w, ) → Det (w1, ), NP (w2, ):,Φi

Page 24: Learning to Interpret Natural Language Instructions

Using LWFG for Semantic Parsing

• Obtain representation with same underlying meaning despite different syntactic forms“go to the green room” vs “go to the room that is green”

–Dynamic Semantic ModelCan start with an empty semantic modelAugmented based on feedback from the TA componentpush blockToRoom, teal green

Page 25: Learning to Interpret Natural Language Instructions

“Push the star into the teal room”

Task Learning from Demonstrations

Semantic Parsing

Task Abstraction

Our Approach

Page 26: Learning to Interpret Natural Language Instructions

Task Abstraction

• Creates an abstract task to represent a set of similar grounded tasks– Specifies goal conditions and reward function in

terms of OO-MDP propositional functions

• Combines information from SP and IRL and provides feedback to both– Provides SP domain specific semantic information– Provides IRL relevant features

Page 27: Learning to Interpret Natural Language Instructions

Similar Grounded TasksGoal ConditionobjectIn(o2, l1)Other Terminal FactsisSquare(o2)isYellow(o2)isRed(l1)isGreen(l2)agentIn(o46, l2)…

Goal ConditionobjectIn(o15, l2)Other Terminal FactsisStar(o15)isYellow(o15)isGreen(l2)isRed(l3)objectIn(o31, l3)…

Some grounded tasks are similar in logical goal conditions, but different in object references and facts about those referenced objects

Page 28: Learning to Interpret Natural Language Instructions

Abstract Task Definition

• Define task conditions as a partial first order logic expression

• The expression takes propositional function parameters to complete it

)(ObjectIn o,l Locationl Object,o

)isGreen()isStar()isYellow(

)(ObjectIn

loo

lo , Locationl Object,o

Page 29: Learning to Interpret Natural Language Instructions

System Structure

Semantic Parsing Inverse Reinforcement Learning (IRL)

Task Abstraction

Page 30: Learning to Interpret Natural Language Instructions

Pilot StudyUnigram Language Model

Data: (Si, Ti).Hidden Data: zji : Pr(Rj | (Si, Ti) ).

Parameters: xkj : Pr(wk | Rj ).

Page 31: Learning to Interpret Natural Language Instructions

Pilot Study

Data: (Si, Ti).Hidden Data: zji : Pr(Rj | (Si, Ti) ).

Parameters: xkj : Pr(wk | Rj ).

Unigram Language Model

Pr(“push”|R1) Pr(“room”|R1)...

Pr(“push”|R2) Pr(“room”|R2)...

Page 32: Learning to Interpret Natural Language Instructions

Pilot StudyNew Instruction, SN : “Go to the green room.”

Pr(“push”|R1) Pr(“room”|R1)...

Pr(“push”|R2) Pr(“room”|R2)...

422

711

101.4)|Pr()|Pr(

,106.8)|Pr()|Pr(

Nk

Nk

SwkN

SwkN

RwRS

RwRS

Page 33: Learning to Interpret Natural Language Instructions

Pilot StudyNew Instruction, S’N : “Go with the star to green.”

Pr(“push”|R1) Pr(“room”|R1)...

Pr(“push”|R2) Pr(“room”|R2)...

5

'22

7

'11

101.2)|Pr()|'Pr(

,103.8)|Pr()|'Pr(

Nk

Nk

SwkN

SwkN

RwRS

RwRS

Page 34: Learning to Interpret Natural Language Instructions

Outline

• What is the problem?• Our approach• Pilot study• Conclusions and future work

Page 35: Learning to Interpret Natural Language Instructions

Conclusions

• Proposed new approach for training an artificial agent to learn to carry out complex multistep tasks specified in natural language, from pairs of instructions and demonstrations

• Showed pilot study with a simplified model

Page 36: Learning to Interpret Natural Language Instructions

Current/Future Work• SP, TA and IRL integration• High-level multistep tasks• Probabilistic LWFGs– Probabilistic domain (semantic) model– Learning/Parsing algorithms with both hard and soft

constraints• TA – add support for hierarchical task definitions

that can handle temporal conditions• IRL – find a partitioning of the execution trace

where each partition has it own reward function

Page 37: Learning to Interpret Natural Language Instructions

Thank you!