Integrating Learning in a Multi-Scale Agent

expressiveintelligencestudio

Integrating Learning in a Multi-Scale Agent

Ben Weber

Dissertation Defense

May 18, 2012

expressiveintelligencestudio UC Santa Cruz

Introduction

AI has a long history of using games to advance the state of the field

[Shannon 1950]


Real-Time Strategy Games

Building human-level AI for RTS games remains an open research challenge

StarCraft II, Blizzard Entertainment


Task Environment Properties

Chess StarCraft Taxi Driving

Fully vs. partially observable

Fully Partially Partially

Deterministic vs. stochastic

Deterministic Deterministic* Stochastic

Episodic vs. sequential

Sequential Sequential Sequential

Static vs. dynamic Static Dynamic Dynamic

Discrete vs. continuous

Discrete Continuous Continuous

Single vs. multiagent Multi Multi Multi

[Russell & Norvig 2009]


Motivation

RTS games present complex environments and complex tasks

Professional players demonstrate a broad range of reasoning capabilities

Human behavior can be observed, emulated, and evaluated

[Langley 2011, Mateas 2002]


Hypothesis

Reproducing expert-level StarCraft gameplay involves integrating heterogeneous reasoning capabilities


Research Questions

What competencies are necessary for expert StarCraft gameplay?

Which competencies can be learned from demonstrations?

How can these competencies be integrated in a real-time agent?


Overview

StarCraft

Multi-Scale AI

Learning from Demonstration

Integrating Learning

Evaluation


StarCraft

Expert gameplay

300+ APM

Evolving meta-game

Exhibited capabilities

Estimation

Anticipation

Adaptation

[Flash, Pro-gamer]


StarCraft Gameplay

Expand Tech Tree

Manage Economy Produce Units

Attack Opponent


Gameplay Scales in StarCraft

Individual

Squad

Global Support

siege line

Worker harassment

Aggressive mine placement


State Space

The following number of states are possible, considering only unit type and location:

(Type * X * Y)Units

States on a 256x256 tile map:

(100*256*256)1700 > 1011,500


Decision Complexity

The set of possible actions that can be executed at a particular moment:

O(2W(A * P) + 2T(D + S) + B(R + C))

W – number of workers

A – number of the type of worker assignments

P – average number of workspaces

T – number of troops

D – number of movement directions

[Aha et al. 2005]


Decision Complexity

The set of possible actions that can be executed at a particular moment:

O(W * A * P + T * D * S + B(R + C))

Assumption

Unit actions can be selected independently

Resulting complexity:

Assuming 50 worker units on a 256x256 tile map results in more than 1,000,000 possible actions


StarCraft

Complex gameplay

Real-world properties

Highly-competitive

Sources of expert gameplay


Research Question #1

What competencies are necessary for expert StarCraft gameplay?


Multi-Scale AI

Multiple scales

Actions are performed across multiple levels of coordination

Interrelated tasks

Performance in each tasks impacts other tasks

Real-time

Actions are performed in real time


Reactive Planning

Provides useful mechanisms for building multi-scale agents

Advantages

Efficient behavior selection

Interleaved plan expansion and execution

Disadvantages

Lacks deliberative capabilities

[Loyall 1997, Mateas 2002]


Agent Design

Implemented in the ABL reactive planning language

Architecture

Extension of McCoy & Mateas integrated agent framework

Partitions gameplay into distinct competencies

Uses a blackboard for coordination

[McCoy & Mateas 2008]


EISBot Managers

Strategy Manager

Income Manager

Production Manager

Tactics Manager

Recon Manager

Gather Resources

Construct Buildings

Attack Opponent

Scout Opponent


Multi-Scale Idioms

Design patterns for authoring multi-scale AI

Idioms

Message passing

Daemon behaviors

Managers

Unit subtasks

Behavior locking


Idioms in EISBot

Initial_tree

Tactics Manager Strategy Manager Income Manager

Form Squad

Squad Monitor

Squad Attack Squad Retreat

Attack Enemy Pump Probes

Legend

Subgoal

Daemon behavior

Message passing Dragoon Dance

Timing Attack WME Probe Stop WME


Multi-Scale AI

StarCraft gameplay is multi-scale

Reactive planning provides mechanisms for multi-scale reasoning

Idioms are applied in EISBot to support StarCraft gameplay



Which competencies can be learned from demonstrations?



Objective

Emulate capabilities exhibited by expert players by harnessing gameplay demonstrations

Methods

Classification and regression model training

Case-based goal formulation

Parameter selection for model optimization


Strategy Prediction

Tasks

Identify opponent build orders

Predict when buildings will be constructed

0

100

200

300

400

0 4 Game Time (minutes)

Spawning Pool Timing

[Hsieh & Sun 2008]


Approach

Feature encoding

Each player’s actions are encoded in a single vector

Vectors are labeled using a build-order rule set

Features describe the game cycle when a unit or building type is first produced by a player

t, time when x is first produced by P

0, x was not (yet) produced by P

f(x) = {


Strategy Prediction Results

0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5 6 7 8 9 10 11 12

Re

call

Pre

cisi

on

Game Time (minutes)

NNge Boosting Rule Set State Lattice


Strategy Learning

Task

Learn build-orders from demonstration

Trace Algorithm

Converts replays to a trace representation

Formulates goals based on most similar situation

q = argminc ϵ L distance(s, c)

g = s + (q’ - q)

[Ontañón et al. 2010]


Trace Retrieval: Example

Consider a planning window of size 2

S =< 3, 0, 1, 1 >

T1 =< 2, 0, 0.5, 1 >

T2 =< 3, 0, 0.7, 1 >

T3 =< 4, 1, 0.9, 1 >

T4 =< 4, 1, 1.1, 2 >


Trace Retrieval: Step 1

The system retrieves the most similar case, q

S =< 3, 0, 1, 1 >

T1 =< 2, 0, 0.5, 1 >

T2 =< 3, 0, 0.7, 1 >

T3 =< 4, 1, 0.9, 1 >

T4 =< 4, 1, 1.1, 2 >


Trace Retrieval : Step 2

q’ is retrieved

S =< 3, 0, 1, 1 >

T1 =< 2, 0, 0.5, 1 >

T2 =< 3, 0, 0.7, 1 >

T3 =< 4, 1, 0.9, 1 >

T4 =< 4, 1, 1.1, 2 >



The difference is computed: T4 – T2 = <1,1,0.4,1>

S =< 3, 0, 1, 1 >

T1 =< 2, 0, 0.5, 1 >

T2 =< 3, 0, 0.7, 1 >

T3 =< 4, 1, 0.9, 1 >

T4 =< 4, 1, 1.1, 2 >



g is computed:

S =< 3, 0, 1, 1 >

T1 =< 2, 0, 0.5, 1 >

T2 =< 3, 0, 0.7, 1 >

T3 =< 4, 1, 0.9, 1 >

T4 =< 4, 1, 1.1, 2 >

g = s + (T4 – T2) = <4, 1, 1.4, 2>


Strategy Learning Results

0

2

4

6

8

10

12

14

0 10 20 30 40 50 60 70 80 90 100

Pre

dic

tio

n E

rro

r (R

MSE

)

Actions performed by player

Opponent modeling with a window size of 20

Null

IB1

Trace

MultiTrace


State Estimation

Task

Estimate enemy positions given prior observations

Particle Model

Apply movement model

Remove visible particles

Reweight particles

[Thrun 2002, Bererton 2004]


Parameter Selection

Free parameters

Trajectory weights

Decay rates

State estimation is represented as an optimization problem

Input: parameter weights

Output: particle model error

Replays are used to implement a particle model error function


State Estimation Results

0

20

40

60

80

100

120

140

160

0 2 4 6 8 10 12 14 16 18

Th

reat

Pre

dic

tio

n E

rro

r

Game Time (Minutes)

Null Model Perfect Tracker Default Model Optimized Model



Anticipation

Classification and regression models

Adaptation

Case-based goal formulation

Estimation

Model optimization



How can these competencies be integrated in a real-time agent?


Agent Architecture


Integration Approaches

Augmenting working memory

External plan generation

External goal formulation Working Memory

External Components


Augmenting Working Memory

Supplementing working memory with additional beliefs


External Plan Generation

Generating plans outside the scope of ABL


External Goal Formulation

Formulating goals outside the scope of ABL


Goal-Driven Autonomy

A framework for building self introspective agents

GDA agents monitor plan execution, detect discrepancies, and explain failures

Implementations

Hand-authored rules

Case-based reasoning

[Molineaux et al. 2010, Muñoz-Avila et al. 2010]


GDA Subtasks

Expectation generation

Discrepancy detection

Explanation generation

Goal formulation


Implementation


Integrating Learning

ABL agents can be interfaced with external learning components

Applying the GDA model enabled tighter coordination across capabilities

EISBot incorporates ABL behaviors, a particle model, and a GDA implementation


Evaluation

Claim

Reproducing expert-level StarCraft gameplay involves integrating heterogeneous reasoning capabilities

Experiments

Ablation studies

User study


GDA Ablation Study

Agent configurations

Base

Formulator

Predictor

GDA

Free parameters

Planning window size

Look-ahead window size

Discrepancy period

Discrepancy Detector

Explanation Generator

Goal Formulator

Goal Manager

Discrepancies

Explanations

Goals


GDA Results

Overall results from the GDA experiments

Agent

Win Ratio

Base 0.73

Formulator 0.77

Predictor 0.81

GDA 0.92


User Study

Experiment setup

Matches hosted on ICCup

3 trials

Testing script

1. Launch StarCraft

2. Connect to server

3. Host match

4. Announce experiment

[Dennis Fong, Pro-gamer]


Performance on Tau Cross

0

500

1000

1500

2000

0 10 20 30 40 50

ICC

up

Sco

re

Number of Games Played

Base

Formulator

Predictor

GDA


ICCup Results

Agent Longinus Python Tau Cross Overall

Base 942 599 669 737

Formulator 980 718 1078 925

Predictor 1111 555 1145 937

GDA 952 860 1293 1035


EISBot Ranking

Rankings achieved by the complete GDA agent

Trial

Percentile Ranking

Longinus 32nd

Python 8th

Tau Cross 66th

Average 48th


Evaluation

Ablation Studies

Optimized particle model

Complete GDA model

Integrating additional capabilities into EISBot improved performance

EISBot performed at the level of a competitive amateur StarCraft player


Conclusion

Objective

Identify and realize capabilities necessary for expert-level StarCraft gameplay in an agent

Approach

Decompose gameplay

Learn capabilities from demonstrations

Integrate learned gameplay models

Evaluate versus humans and agents


Contributions

Idioms for authoring multi-scale agents

Methods for learning from demonstration

Integration approaches for ABL agents


Integrating Learning in a Multi-Scale Agent

Ben G. Weber

Ph.D. Candidate

Expressive Intelligence Studio

UC Santa Cruz

[email protected]

Funding

NSF Grant IIS – 1018954


References

Aha, Molineaux, & Ponsen. 2005. “Learning to Win: Case-Based Plan Selection in a Real-Time Strategy Game”, Proceedings of ICCBR.

Bererton. 2004. “State Estimation for Game AI using Particle Filters”, Proceedings of AAI Workshop on Challenges in Game AI.

Hsieh & Sun. 2008. “Building a Player Strategy Model by Analyzing Replays of Real-Time Strategy Games”, Proceedings of IJCNN.

Langley. 2011. “Artificial Intelligence and Cognitive Systems”, AISB Quarterly.

Loyall. 1997. “Believable Agents: Building Interactive Personalities”, Ph.D. thesis, CMU.

Mateas. 2002. “Believable Agents: Building Interactive Personalities”, Ph.D. thesis, CMU.


References

McCoy & Mateas. 2008. “An Integrated Agent for Playing Real-Time Strategy Games”, Proceedings of AAAI.

Molineaux, Klenk, Aha. 2010. “Goal-Driven Autonomy in a Navy Strategy Simulation”, Proceedings of AAAI.

Muñoz-Avila, Aha, Jaidee, Klenk, Molineaux. 2010. “Applying Goal Driven Autonomy to a Team Shooter Game”, Proceedings of FLAIRS.

Ontañón, Mishra, Sugandh, Ram. 2010. “On-line Case-Based Planning”, Computational Intelligence.

Russell & Norvig. 2009. Artificial Intelligence: A Modern Approach.

Shannon. 1950. “Programming a Computer for Playing Chess”, Philosophical magazine .

Thrun. 2002. “Particle Filters in Robotics”, Proceedings of UAI.

Documents

Integrating Learning in a Multi-Scale Agent