64
Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Embed Size (px)

Citation preview

Page 1: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Grounding Linguistic Analysis in Control Applications

S.R.K. BranavanFebruary 8, 2012

Page 2: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Language and Control Applications

Right click "My Computer" on the desktop, and click the Manage menu option.

Click Services after expanding "Services and applications"

...

Cities built on or near water sources can irrigate to increase their crop yields.

...

Microsoft WindowsWindows help document

Strategy gameGame user guide

Page 3: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Our Goal

Leverage language-control connection to:

• Learn language analysis from control feedback

• Improve control performance using textual information

Text interpretation can enable novel automation

Page 4: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Semantic Interpretation: Traditional Approach

Map text into an abstract representation

(Typical papers on semantics)

Page 5: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Semantic Interpretation: Our Approach

Map text to control actions

Enables language learning from control feedback

Build your city on grassland with a river running through it if possible.

Text

MOVE_TO (7,3)

BUILD_CITY ()

...

Control actions

Page 6: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Learning from Control Feedback

Build your city on grassland with a river running through it if possible.

TextMOVE_TO (7,3)

BUILD_CITY ()

Control actions

(7, 3) is grassland with a river

Highergame score

MOVE_TO (5,1)

BUILD_CITY ()

Control actions

(5, 1) is a desert

Lowergame score

Learn Language via Reinforcement Learning

Page 7: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Challenges

• Situational RelevanceRelevance of textual information depends on control state

“Cities built on or near water sources can irrigate to increase their crop yields, and cities near mineral resources can mine for raw materials.”

• AbstractionText can describe abstract concepts in the control application

“Build your city on grassland.”“Water is required for irrigation.”

• IncompletenessText does not provide complete information about the control application

“Build your city on grassland with a river running through it if possible.”

(what if there are no rivers nearby?)

Page 8: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Contributions

• Learn language analysis from control feedback

• Improve control performance using text information

InstructionInterpretation

ComplexGame-play

High-levelPlanning

Page 9: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Input:• Text documents• Interactive access to control application

Prior knowledge:• Text provides information useful for the control

application

Goal:• Learn to interpret the text and learn effective control

General Setup

Page 10: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Outline

1. Step-by-step imperative instructions- Learning from control feedback

2. High-level strategy descriptions- Situational relevance- Incompleteness- Learning from control feedback

3. General descriptions of world dynamics- Abstractions- Situational relevance- Incompleteness- Learning from control feedback

Page 11: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Outline

1. Step-by-step imperative instructions- Learning from control feedback

2. High-level strategy descriptions- Situational relevance- Incompleteness- Learning from control feedback

3. General descriptions of world dynamics- Abstractions- Situational relevance- Incompleteness- Learning from control feedback

Page 12: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

12

Instructions: step-by-stepdescriptions of commands

Target environment:where commandsneed to be executed

Command sequenceexecutable in theenvironment

Ou

tpu

tIn

pu

tInterpreting Instructions to Actions

Page 13: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

13

Segment text to chunks that describe individual commands

Learn translation of words to environment commands

Reorder environment commands

Instruction Interpretation: Challenges

Page 14: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Instruction Interpretation: Representation

Markov Decision Process - select text segment, translate & execute:

Page 15: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

15

Instruction Interpretation: Representation

State s = Observed Text + Observed EnvironmentAction a = Word Selection + Environment

Command

Page 16: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

16

Model Parameterization

Define policy function as a log-linear distribution:

- real valued feature function on state and action

- parameters of model

Represent each action with a feature vector:

Page 17: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

17

Reward Signal

Alternative Indication of Error:Text specifies objects not visible on the screen

Approximation: If a sentence matches no GUI labels, a preceding action is wrong

Ideal: Test for task completion

Page 18: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

18

Learning Using Reward Signal: Challenges

2. Number of candidate action sequences is very large

How can this space be effectively searched?

Use Reinforcement Learning

1. Reward can be delayed

How can reward be propagated to individual actions

Reward

Page 19: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

19

Learning Algorithm

Goal: Find θ that maximizes the expected reward

Method: Policy gradient algorithm (stochastic gradient ascent on θ)

Page 20: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

21

Windows 2000 help documentsfrom support.microsoft.com

Windows Configuration Application

Total # of documents 128

Train / development / test 70 / 18 / 40

Total # of words 5562

Vocabulary size 610

Avg. words per sentence 9.93

Avg. sentences per document 4.38

Avg. actions per document 10.37

Page 21: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Results: Command Accuracy

0% 10% 20% 30% 40% 50% 60% 70% 80%

29%

79%

79%

Majority heuristic

Control feedback

Manual supervision

% commands correctly mapped

Page 22: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Outline

1. Step-by-step imperative instructions- Learning from control feedback

2. High-level strategy descriptions- Situational relevance- Incompleteness- Learning from control feedback

3. General descriptions of world dynamics- Abstractions- Situational relevance- Incompleteness- Learning from control feedback

Page 23: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Solving Hard Decision Tasks

Civilization II Player’s GuideYou start with two settler units. Although settlers are capable of performing a variety of useful tasks, your first task is to move the settlers to a site that is suitable for the construction of your first city. Use settlers to build the city on grassland with a river running through it

Page 24: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Solving Hard Decision Tasks

Objective: Maximize a utility function

Challenge: Finding optimal solution hard• Large decision space• Expensive simulations

Traditional solution: Manually encoded domain-knowledge

Our goal: Automatically extract required domain knowledge from text

Page 25: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

27

Adversarial Planning Problem

Civilization II : Complex multiplayer strategy game(branching factor 1020)

Traditional Approach: Monte-Carlo Search Framework

• Learn action selection policy from simulations

• Very successful in complex games like Go and Poker

Page 26: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

28

Leveraging Textual Advice: Challenges

1. Find sentences relevant to given game state.

Game state Strategy document

You start with two settler units. Although settlers are capable of performing a variety of useful tasks, your first task is to move the settlers to a site that is suitable for the construction of your first city. Use settlers to build the city on grassland with a river running through it if possible. You can also use settlers to irrigate land near your city. In order to survive and grow …

settlercity

Page 27: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

29

Game state Strategy document

Leveraging Textual Advice: Challenges

1. Find sentences relevant to given game state.

settlercity

You start with two settler units. Although settlers are capable of performing a variety of useful tasks, your first task is to move the settlers to a site that is suitable for the construction of your first city. Use settlers to build the city on grassland with a river running through it if possible. You can also use settlers to irrigate land near your city. In order to survive and grow …

Page 28: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

30

settlercity

settler

Game state Strategy document

You start with two settler units. Although settlers are capable of performing a variety of useful tasks, your first task is to move the settlers to a site that is suitable for the construction of your first city. Use settlers to build the city on grassland with a river running through it if possible. You can also use settlers to irrigate land near your city. In order to survive and grow …

Leveraging Textual Advice: Challenges

1. Find sentences relevant to given game state.

Page 29: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

31

Leveraging Textual Advice: Challenges

Move the settler to a site suitable for building a city, onto grassland with a river if possible.

2. Label sentences with predicate structure.

Move the settler to a site suitable for building a city, onto grassland with a river if possible.

move_settlers_to () settlers_build_city ()

?

?

move_settlers_to ()

Label words as action, state or background

Page 30: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

32

Leveraging Textual Advice: Challenges

Move the settler to a site suitable for building a city, onto grassland with a river if possible.

2. Label sentences with predicate structure.

Move the settler to a site suitable for building a city, onto grassland with a river if possible.

move_settlers_to () settlers_build_city ()

?

?

move_settlers_to ()

Label words as action, state or background

Page 31: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

33

Leveraging Textual Advice: Challenges

Move the settler to a site suitable for building a city, onto grassland with a river if possible.

2. Label sentences with predicate structure.

Move the settler to a site suitable for building a city, onto grassland with a river if possible.

move_settlers_to () settlers_build_city ()

?

?

move_settlers_to ()

Label words as action, state or background

Page 32: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

34

Leveraging Textual Advice: Challenges

Build the city on plains or grassland with a river running through it if possible.

a1 – move_settlers_to (7,3)

S

a2 – settlers_build_city ()

a3 – settlers_irrigate_land ()

3. Guide action selection using relevant text

Page 33: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

35

Learning from Game Feedback

Goal: Learn from game feedback as only source of supervision.Key idea: Better parameter settings will lead to more victories.

You start with two settler units. Although settlers are capable of performing a variety of useful tasks, your first task is to move the settlers to a site that is suitable for the construction of your first city. Use settlers to build the city on plains or grassland with a river running through it if possible. In order to survive and grow …

a1

S

a2

a3

wonEnd result

a1

S

a2

a3 lostEnd result

Model params:θ1Model params:θ2

You start with two settler units. Although settlers are capable of performing a variety of useful tasks, your first task is to move the settlers to a site that is suitable for the construction of your first city. Use settlers to build the city on plains or grassland with a river running through it if possible. In order to survive and grow …

Game manual

Game manual

Page 34: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

36

Model Overview

Monte-Carlo Search Framework

• Learn action selection policy from simulations

• Very successful in complex games like Go and Poker.

Our Algorithm

• Learn text interpretation from simulation feedback

• Bias action selection policy using text

Page 35: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

37

Monte-Carlo Search

Actual Game

State 1

Simulation

Irrigate

State 1

Copy

Game lost

Copygame

???

Select actions via simulations, game and opponent can be stochastic

Page 36: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

38

3.53.53.5

Monte-Carlo SearchTry many candidate actions from current state & see how well they perform.

State 1Current game state

Game scores

Rollo

ut d

epth

0.1 0.4 3.51.2

Page 37: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

39

Current game state

Monte-Carlo SearchTry many candidate actions from current state & see how well they perform.Learn feature weights from simulation outcomes

State 1

Game scores

Rollo

ut d

epth

0.1 0.4 3.51.2

- feature function- model parameters

. . . . . . . . .

5 1 0 1 1 0 1 = 0.1

15 0 1 0 0 1 0 = 0.4

37 1 0 1 0 0 0 = 1.2

Page 38: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

40

Model Overview

Monte-Carlo Search Framework

• Learn action selection policy from simulations

Our Algorithm

• Bias action selection policy using text

• Learn text interpretation from simulation feedback

Page 39: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

41

• Identify sentence relevant to game state

• Label sentence with predicate structure

• Estimate value of candidate actions

Modeling Requirements

Build cities near rivers or ocean.

Build cities near rivers or ocean. Build cities near rivers or ocean.

Build cities near rivers or ocean.

Fortify :Irrigate : -10

. . . .

Build city :

-5

25

Page 40: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

42

Sentence Relevance

Sentence is selected as relevant

State , candidate action , document

- weight vector

- feature functionLog-linear model:

Identify sentence relevant to game state and action

1

2

3

Page 41: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

43

Word index , sentence , dependency info

- weight vector

- feature functionLog-linear model:

Predicate Structure

Select word labels based on sentence + dependency infoE.g., “Build cities near rivers or ocean.”

Predicate label = { action, state, background }

1

2

3

Page 42: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

44

- weight vector

- feature functionLinear model:

Final Q function approximation

State , candidate action

Predict expected value of candidate action

Document , relevant sentence , predicate labeling

1

2

3

Page 43: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

45

Multi-layer neural network: Each layer represents a different stage of analysis

Model Representation

Input:game state, candidate action, document text

Select most relevant sentence

Q function approximation

Predict sentence predicate structure

Predicted action value

Page 44: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

46

Parameter Estimation

Objective:Minimize mean square error between predicted utility

and observed utility

25

State

Predicted utility:

Action

Observed utility:

Game rollout

Method: Gradient descent – i.e., Backpropagation.

Page 45: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

48

Experimental Domain

Sentences: 2083

Avg. sentence words: 16.7

Vocabulary: 3638

Document:• Official game manual of Civilization II

Text Statistics:

Game:• Complex, stochastic turn-based

strategy game Civilization II.• Branching factor: 1020

At the start of the game, your

civilization consists of a single band

of wandering nomads. This is a

settlers unit. Although settlers are

capable of performing a variety of

useful tasks, your first task is to

move the settlers unit to a site that is

suitable for the construction of your

first city. Finding suitable locations

in which to build cities, especially

your first city, is one of the most

important decisions you make in the

Page 46: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

49

Experimental Setup

Game opponent:• Built-in AI of Game.• Domain knowledge rich AI, built to challenge humans.

Primary evaluation:• Games won within first 100 game steps.• Averaged over 200 independent experiments.• Avg. experiment runtime: 1.5 hours

Secondary evaluation:• Full games won.• Averaged over 100 independent experiments.• Avg. experiment runtime: 4 hours

Page 47: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

50

Results

Series1

0% 20% 40% 60%

53.7%Full model

Built-in AI 0%

% games won in 100 turns, averaged over 200 runs.

Page 48: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

51

Does Text Help ?

Series1

0% 20% 40% 60%

53.7%

17.3%

Full model

Game only

Built-in AI 0%

% games won in 100 turns, averaged over 200 runs.

Linear Q fn. approximation, No text

Page 49: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

52

Text vs. Representational Capacity

Series1

0% 20% 40% 60%

53.7%

26.1%

17.3%

Full model

Game only

Latent variable

Built-in AI 0%

Non-Linear Q fn. approximation, No text

% games won in 100 turns, averaged over 200 runs.

Page 50: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

53

Linguistic Complexity vs. Performance Gain

Series1

0% 20% 40% 60%

53.7%

46.7%

26.1%

17.3%

Full model

Sentence relevance

Game only

Latent variable

Built-in AI 0%

% games won in 100 turns, averaged over 200 runs.

Page 51: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

54

Results: Sentence Relevance

Problem: Sentence relevance depends on game state.States are game specific, and not known a priori!

Solution: Add known non-relevant sentences to text.E.g., sentences from the Wall Street Journal corpus.

Results: 71.8% sentence relevance accuracy…Surprisingly poor accuracy given game win rate!

Page 52: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

55

Results: Sentence Relevance

Sent

ence

rele

vanc

e ac

cura

cy

Page 53: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

57

Results: Full Games

Series1

0% 20% 40% 60%

65.4%

31.5%

24.8%

Full model

Latent variable

Percentage games won, averaged over 100 runs

Game only

Page 54: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Outline

1. Step-by-step imperative instructions- Learning from control feedback

2. High-level strategy descriptions- Situational relevance- Incompleteness- Learning from control feedback

3. General descriptions of world dynamics- Abstractions- Situational relevance- Incompleteness- Learning from control feedback

Page 55: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Solving Hard Planning Tasks

Objective: Compute plan to achieve given goal

Challenge: Exponential search space

Traditional solution: Analyze domain structure to induce sub-goals

Our goal: Use precondition information from text to guide sub-goal induction

Page 56: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Solving Hard Planning Tasks

Minecraft : Virtual world allowing tool creation and complex construction.

Page 57: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Example Precondition Relations

Seeds planted in farmland will grow to become wheat which can be harvested.

Text

seeds wheat

farmland wheat

Preconditions

Page 58: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Challenge

Seeds planted in farmland will grow to become wheat which can be harvested.

seeds wheat

farmland wheat

Text

Preconditions

Plan

Difference in level of abstraction

Page 59: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Model

Text Log-linear model

Planningtarget goal

Plan

Log-linear model

Low-level planner (FF)

Learn model parameters from planning feedback

Model precondition descriptions

Model object relations in world, and ground preconditions.

Precondition relations

Sub-goal sequence

Page 60: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Experimental Domain

Sentences: 242

Vocabulary: 979

Documents:User authored wiki articles

Text Statistics:

World:Minecraft virtual world

Tasks: 98

Avg. plan length: 35

Planning task Statistics:

Pickaxes Pickaxes are one of the most commonly used tools in the game, being required to mine all ores and many other types of blocks. Different qualities of pickaxe are required to successfully

Page 61: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Results

Method % plans solved

Low-level planner (FF) 40.8

No text 69.4

Full model 80.2

Manual text connections 84.7

Page 62: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Results: Text Analysis

Page 63: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Related Work

• Instruction interpretationLearned from manual supervision

• Game playingAction selection based only on game state

• High-level planningBased on analysis of world dynamics

Page 64: Grounding Linguistic Analysis in Control Applications S.R.K. Branavan February 8, 2012

Contributions

• Learn language analysis from control feedback

• Improve control performance using text information

InstructionInterpretation

ComplexGame-play

High-levelPlanning