62
expressive intelligence studio Integrating Learning in a Multi-Scale Agent Ben Weber Dissertation Defense May 18, 2012

Integrating Learning in a Multi-Scale Agent

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio

Integrating Learning in a Multi-Scale Agent

Ben Weber

Dissertation Defense

May 18, 2012

Page 2: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Introduction

AI has a long history of using games to advance the state of the field

[Shannon 1950]

Page 3: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Real-Time Strategy Games

Building human-level AI for RTS games remains an open research challenge

StarCraft II, Blizzard Entertainment

Page 4: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Task Environment Properties

Chess StarCraft Taxi Driving

Fully vs. partially observable

Fully Partially Partially

Deterministic vs. stochastic

Deterministic Deterministic* Stochastic

Episodic vs. sequential

Sequential Sequential Sequential

Static vs. dynamic Static Dynamic Dynamic

Discrete vs. continuous

Discrete Continuous Continuous

Single vs. multiagent Multi Multi Multi

[Russell & Norvig 2009]

Page 5: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Motivation

RTS games present complex environments and complex tasks

Professional players demonstrate a broad range of reasoning capabilities

Human behavior can be observed, emulated, and evaluated

[Langley 2011, Mateas 2002]

Page 6: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Hypothesis

Reproducing expert-level StarCraft gameplay involves integrating heterogeneous reasoning capabilities

Page 7: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Research Questions

What competencies are necessary for expert StarCraft gameplay?

Which competencies can be learned from demonstrations?

How can these competencies be integrated in a real-time agent?

Page 8: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Overview

StarCraft

Multi-Scale AI

Learning from Demonstration

Integrating Learning

Evaluation

Page 9: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

StarCraft

Expert gameplay

300+ APM

Evolving meta-game

Exhibited capabilities

Estimation

Anticipation

Adaptation

[Flash, Pro-gamer]

Page 10: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

StarCraft Gameplay

Expand Tech Tree

Manage Economy Produce Units

Attack Opponent

Page 11: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Gameplay Scales in StarCraft

Individual

Squad

Global Support

siege line

Worker harassment

Aggressive mine placement

Page 12: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

State Space

The following number of states are possible, considering only unit type and location:

(Type * X * Y)Units

States on a 256x256 tile map:

(100*256*256)1700 > 1011,500

Page 13: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Decision Complexity

The set of possible actions that can be executed at a particular moment:

O(2W(A * P) + 2T(D + S) + B(R + C))

W – number of workers

A – number of the type of worker assignments

P – average number of workspaces

T – number of troops

D – number of movement directions

[Aha et al. 2005]

Page 14: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Decision Complexity

The set of possible actions that can be executed at a particular moment:

O(W * A * P + T * D * S + B(R + C))

Assumption

Unit actions can be selected independently

Resulting complexity:

Assuming 50 worker units on a 256x256 tile map results in more than 1,000,000 possible actions

Page 15: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

StarCraft

Complex gameplay

Real-world properties

Highly-competitive

Sources of expert gameplay

Page 16: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Research Question #1

What competencies are necessary for expert StarCraft gameplay?

Page 17: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Multi-Scale AI

Multiple scales

Actions are performed across multiple levels of coordination

Interrelated tasks

Performance in each tasks impacts other tasks

Real-time

Actions are performed in real time

Page 18: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Reactive Planning

Provides useful mechanisms for building multi-scale agents

Advantages

Efficient behavior selection

Interleaved plan expansion and execution

Disadvantages

Lacks deliberative capabilities

[Loyall 1997, Mateas 2002]

Page 19: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Agent Design

Implemented in the ABL reactive planning language

Architecture

Extension of McCoy & Mateas integrated agent framework

Partitions gameplay into distinct competencies

Uses a blackboard for coordination

[McCoy & Mateas 2008]

Page 20: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

EISBot Managers

Strategy Manager

Income Manager

Production Manager

Tactics Manager

Recon Manager

Gather Resources

Construct Buildings

Attack Opponent

Scout Opponent

Page 21: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Multi-Scale Idioms

Design patterns for authoring multi-scale AI

Idioms

Message passing

Daemon behaviors

Managers

Unit subtasks

Behavior locking

Page 22: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Idioms in EISBot

Initial_tree

Tactics Manager Strategy Manager Income Manager

Form Squad

Squad Monitor

Squad Attack Squad Retreat

Attack Enemy Pump Probes

Legend

Subgoal

Daemon behavior

Message passing Dragoon Dance

Timing Attack WME Probe Stop WME

Page 23: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Multi-Scale AI

StarCraft gameplay is multi-scale

Reactive planning provides mechanisms for multi-scale reasoning

Idioms are applied in EISBot to support StarCraft gameplay

Page 24: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Research Question #2

Which competencies can be learned from demonstrations?

Page 25: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Learning from Demonstration

Objective

Emulate capabilities exhibited by expert players by harnessing gameplay demonstrations

Methods

Classification and regression model training

Case-based goal formulation

Parameter selection for model optimization

Page 26: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Strategy Prediction

Tasks

Identify opponent build orders

Predict when buildings will be constructed

0

100

200

300

400

0 4 Game Time (minutes)

Spawning Pool Timing

[Hsieh & Sun 2008]

Page 27: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Approach

Feature encoding

Each player’s actions are encoded in a single vector

Vectors are labeled using a build-order rule set

Features describe the game cycle when a unit or building type is first produced by a player

t, time when x is first produced by P

0, x was not (yet) produced by P

f(x) = {

Page 28: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Strategy Prediction Results

0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5 6 7 8 9 10 11 12

Re

call

Pre

cisi

on

Game Time (minutes)

NNge Boosting Rule Set State Lattice

Page 29: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Strategy Learning

Task

Learn build-orders from demonstration

Trace Algorithm

Converts replays to a trace representation

Formulates goals based on most similar situation

q = argminc ϵ L distance(s, c)

g = s + (q’ - q)

[Ontañón et al. 2010]

Page 30: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Trace Retrieval: Example

Consider a planning window of size 2

S =< 3, 0, 1, 1 >

T1 =< 2, 0, 0.5, 1 >

T2 =< 3, 0, 0.7, 1 >

T3 =< 4, 1, 0.9, 1 >

T4 =< 4, 1, 1.1, 2 >

Page 31: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Trace Retrieval: Step 1

The system retrieves the most similar case, q

S =< 3, 0, 1, 1 >

T1 =< 2, 0, 0.5, 1 >

T2 =< 3, 0, 0.7, 1 >

T3 =< 4, 1, 0.9, 1 >

T4 =< 4, 1, 1.1, 2 >

Page 32: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Trace Retrieval : Step 2

q’ is retrieved

S =< 3, 0, 1, 1 >

T1 =< 2, 0, 0.5, 1 >

T2 =< 3, 0, 0.7, 1 >

T3 =< 4, 1, 0.9, 1 >

T4 =< 4, 1, 1.1, 2 >

Page 33: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Trace Retrieval : Step 3

The difference is computed: T4 – T2 = <1,1,0.4,1>

S =< 3, 0, 1, 1 >

T1 =< 2, 0, 0.5, 1 >

T2 =< 3, 0, 0.7, 1 >

T3 =< 4, 1, 0.9, 1 >

T4 =< 4, 1, 1.1, 2 >

Page 34: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Trace Retrieval : Step 4

g is computed:

S =< 3, 0, 1, 1 >

T1 =< 2, 0, 0.5, 1 >

T2 =< 3, 0, 0.7, 1 >

T3 =< 4, 1, 0.9, 1 >

T4 =< 4, 1, 1.1, 2 >

g = s + (T4 – T2) = <4, 1, 1.4, 2>

Page 35: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Strategy Learning Results

0

2

4

6

8

10

12

14

0 10 20 30 40 50 60 70 80 90 100

Pre

dic

tio

n E

rro

r (R

MSE

)

Actions performed by player

Opponent modeling with a window size of 20

Null

IB1

Trace

MultiTrace

Page 36: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

State Estimation

Task

Estimate enemy positions given prior observations

Particle Model

Apply movement model

Remove visible particles

Reweight particles

[Thrun 2002, Bererton 2004]

Page 37: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Parameter Selection

Free parameters

Trajectory weights

Decay rates

State estimation is represented as an optimization problem

Input: parameter weights

Output: particle model error

Replays are used to implement a particle model error function

Page 38: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

State Estimation Results

0

20

40

60

80

100

120

140

160

0 2 4 6 8 10 12 14 16 18

Th

reat

Pre

dic

tio

n E

rro

r

Game Time (Minutes)

Null Model Perfect Tracker Default Model Optimized Model

Page 39: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Learning from Demonstration

Anticipation

Classification and regression models

Adaptation

Case-based goal formulation

Estimation

Model optimization

Page 40: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Research Question #3

How can these competencies be integrated in a real-time agent?

Page 41: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Agent Architecture

Page 42: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Integration Approaches

Augmenting working memory

External plan generation

External goal formulation Working Memory

External Components

Page 43: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Augmenting Working Memory

Supplementing working memory with additional beliefs

Page 44: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

External Plan Generation

Generating plans outside the scope of ABL

Page 45: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

External Goal Formulation

Formulating goals outside the scope of ABL

Page 46: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Goal-Driven Autonomy

A framework for building self introspective agents

GDA agents monitor plan execution, detect discrepancies, and explain failures

Implementations

Hand-authored rules

Case-based reasoning

[Molineaux et al. 2010, Muñoz-Avila et al. 2010]

Page 47: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

GDA Subtasks

Expectation generation

Discrepancy detection

Explanation generation

Goal formulation

Page 48: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Implementation

Page 49: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Integrating Learning

ABL agents can be interfaced with external learning components

Applying the GDA model enabled tighter coordination across capabilities

EISBot incorporates ABL behaviors, a particle model, and a GDA implementation

Page 50: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Evaluation

Claim

Reproducing expert-level StarCraft gameplay involves integrating heterogeneous reasoning capabilities

Experiments

Ablation studies

User study

Page 51: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

GDA Ablation Study

Agent configurations

Base

Formulator

Predictor

GDA

Free parameters

Planning window size

Look-ahead window size

Discrepancy period

Discrepancy Detector

Explanation Generator

Goal Formulator

Goal Manager

Discrepancies

Explanations

Goals

Page 52: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

GDA Results

Overall results from the GDA experiments

Agent

Win Ratio

Base 0.73

Formulator 0.77

Predictor 0.81

GDA 0.92

Page 53: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

User Study

Experiment setup

Matches hosted on ICCup

3 trials

Testing script

1. Launch StarCraft

2. Connect to server

3. Host match

4. Announce experiment

[Dennis Fong, Pro-gamer]

Page 54: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Performance on Tau Cross

0

500

1000

1500

2000

0 10 20 30 40 50

ICC

up

Sco

re

Number of Games Played

Base

Formulator

Predictor

GDA

Page 55: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

ICCup Results

Agent Longinus Python Tau Cross Overall

Base 942 599 669 737

Formulator 980 718 1078 925

Predictor 1111 555 1145 937

GDA 952 860 1293 1035

Page 56: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

EISBot Ranking

Rankings achieved by the complete GDA agent

Trial

Percentile Ranking

Longinus 32nd

Python 8th

Tau Cross 66th

Average 48th

Page 57: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Evaluation

Ablation Studies

Optimized particle model

Complete GDA model

Integrating additional capabilities into EISBot improved performance

EISBot performed at the level of a competitive amateur StarCraft player

Page 58: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Conclusion

Objective

Identify and realize capabilities necessary for expert-level StarCraft gameplay in an agent

Approach

Decompose gameplay

Learn capabilities from demonstrations

Integrate learned gameplay models

Evaluate versus humans and agents

Page 59: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Contributions

Idioms for authoring multi-scale agents

Methods for learning from demonstration

Integration approaches for ABL agents

Page 60: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

Integrating Learning in a Multi-Scale Agent

Ben G. Weber

Ph.D. Candidate

Expressive Intelligence Studio

UC Santa Cruz

[email protected]

Funding

NSF Grant IIS – 1018954

Page 61: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

References

Aha, Molineaux, & Ponsen. 2005. “Learning to Win: Case-Based Plan Selection in a Real-Time Strategy Game”, Proceedings of ICCBR.

Bererton. 2004. “State Estimation for Game AI using Particle Filters”, Proceedings of AAI Workshop on Challenges in Game AI.

Hsieh & Sun. 2008. “Building a Player Strategy Model by Analyzing Replays of Real-Time Strategy Games”, Proceedings of IJCNN.

Langley. 2011. “Artificial Intelligence and Cognitive Systems”, AISB Quarterly.

Loyall. 1997. “Believable Agents: Building Interactive Personalities”, Ph.D. thesis, CMU.

Mateas. 2002. “Believable Agents: Building Interactive Personalities”, Ph.D. thesis, CMU.

Page 62: Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio UC Santa Cruz

References

McCoy & Mateas. 2008. “An Integrated Agent for Playing Real-Time Strategy Games”, Proceedings of AAAI.

Molineaux, Klenk, Aha. 2010. “Goal-Driven Autonomy in a Navy Strategy Simulation”, Proceedings of AAAI.

Muñoz-Avila, Aha, Jaidee, Klenk, Molineaux. 2010. “Applying Goal Driven Autonomy to a Team Shooter Game”, Proceedings of FLAIRS.

Ontañón, Mishra, Sugandh, Ram. 2010. “On-line Case-Based Planning”, Computational Intelligence.

Russell & Norvig. 2009. Artificial Intelligence: A Modern Approach.

Shannon. 1950. “Programming a Computer for Playing Chess”, Philosophical magazine .

Thrun. 2002. “Particle Filters in Robotics”, Proceedings of UAI.