IJCNN, International Joint Conference on Neural Networks , San Jose 2011

IJCNN, International Joint Conference on Neural Networks, San Jose 2011

Pawel Raif Silesian University of Technology, Poland,

Janusz A. Starzyk Ohio University, USA,

Motivated Learningin

Autonomous Systems

Outline

• Reinforcement Learning (RL)• Goal Creation System (GCS)

yields self-organizing pain based network

• Motivated Learning (ML) as a combination of RL + GCS

• Simulations Results• Possible Applications of ML

hierarchical RL

Machine Learning Methods

intrinsic motivation

PROBLEMS IN „REAL WORLD” APPLICATIONS like in

AUTONOMOUS SYSTEMS

machine learning

supervised learning

unsupervised learning

corrective learning

reinforcement learning

„curse of dimensionality”

lack of motivation for development

„top-down approach”

„bottom-up approach”

Reinforcement Learninglearning through interaction with the environment

RLas

r

ENVIRONMENT

Motivated Learning

ML can combine internal goal creation system (GCS) and reinforcement learning (RL).

Motivated learning (ML) is need based motivation, goal creation and learning in an embodied agent. An agent creates hierarchy of goals based on the

primitive need signals. It receives internal rewards for satisfying its goals

(both primitive and abstract). ML applies to EI working in a hostile environment.

actionstate

GCreward

GOALS (motivations)

RL

ML

Motivated Learning – the main IDEA…intrinsic motivations created by learning machines.

An intelligent agent learns how to survive in a hostile environment.

How to motivate a machine?

We suggest that the hostility of the environment,is the most effective motivational factor.

Assumptions

1. ML agent is independent: it can act autonomously in its environment and is able to choose its own way of development.

2. ML agent’s interface to the environment is the same as RL agent’s.

3. Environment is hostile to the agent.

4. Hostility may be active or passive (depleted resources).

5. Environment is fully observable.

Goal Creation SystemNeural self-organizing pain-based structures

Goal creation schemea primitive pain is directly sensedan abstract pain is introduced

by solving a lower level painthresholded curiosity based pain

Motivations and selection of a goalMotivations are as desires in BDI agentWTA competition selects motivationanother WTA selects goals

P2

GwPpG

wBP1

1

Pp

G

M2

wP1GwBP2

1

P1

S1

S2

B1

B2

M1

.

Sk P2 G

M

wPG

wBP2B2

B1

wBP1

1

P1

1

UA

-10

WTA WTA

The least abstract

The most

abstract

Office

Bank

Grocery

Food

SENSOR MOTOR INCREASE DECREASE

Food Eat Sugar level Food supplies

Grocery Buy Food supplies Money amount

Bank Withdraw Money amount Bank account

Office Work Bank account Working possibilities

Internal goalssimple linear hierarchy between different goals

Hierarchy of resources(and possible agent’s goals):

Resources are distributed all over the „grid world”.

3

12

4

Modified „grid world”

This environment is:Complex,

Dynamically changing,Fully observable.

Agent must localize resources and learn how to utilize them

Environment

Internal need signals

Perception of resources

Resources present in the environment

can be used to satisfy the agent’s

needs

Subjective sense of „lack of resources”

By discovering useful resources and their dependencies, learned hierarchy of internal goals expresses the environment complexity.

Resources are distributed all over the „grid world”.

3

1

4

3

4

2

1

2

Relationships between internal goals doesn’t have to be a linear hierarchy.

They may constitute a tree structure or a complex network of resource dependencies.

Relationships between internal goals

By discovering subsequent resources and their dependencies, the complexity of internal goal network grows. BUT each system may have unique experiences (reflecting personal history of development)

Designer’s specified needs

Top level resources

need1 need2

need3

Experiment that combines ML & RL

Every resource discovered by the agent becomes a potential goal and is assigned a value function „level”.

Goal Creation System establishes new goals and switches agent’s activity between them.

RL algorithm learns value functions on different levels.

Experiment Resultsswitching

between goalsat the beginning …

… and at the end.

Initially the agent uses many iterations to reach a goal (red dots).

Sometimes it abandons the goal when another pain dominates.

Final runs are shorter and more successful.

Comparing Primitive Pain Levels of RL & MLExperiment Results

Moving average of the primitive pain signal.

Initially RL agent learns better.

Its performance deteriorates as the resources are depleted

Experiment ResultsEffectiveness in terms of cumulative reward:

Reward determined by the designer of the experiment.

Cumulative reward

Reinforcement Learning Motivated Learning

• Single value function– Various objectives

• Measurable rewards• Predictable• Objectives set by designer• Maximizes the reward

– Potentially unstable

• Learning effort increases with complexity

• Always active

• Multiple value functions– One for each goal

• Internal rewards • Unpredictable• Sets its own objectives• Solves minimax problem

– Always stable

• Learns better in complex environment than RL

• Acts when needed

http://www.bradfordvts.co.uk/images/goal.jpg

ConclusionsMotivated learning method, based on goal creation system, can improve learning of autonomus agents in special class of problems.

ML is especially useful in complex, dynamic environments where it works according to learned hierarchy of goals.

Individual goals use well known reinforcement learning algorithms to learn their corresponding value functions.

ML concerns building internal representations of useful environment percepts, through interaction with the environment.

ML switches machine’s attention and sets intended goals becoming an important mechanism for a cognitive system.

„The real danger is not that computers will begin to think like man, but that man will begin to think like computers.”

Sydney J. Harris

References:• J.A. Starzyk, J.T. Graham, P. Raif, and A-H.Tan, Motivated Learning for the

Development of Autonomous Systems, Cognitive Systems Research, Special issue on Computational Modeling and Application of Cognitive Systems, 12 January 2011.

• Starzyk J.A., Raif P., Ah-Hwee Tan, Motivated Learning as an Extension of Reinforcement Learning, Fourth International Conference on Cognitive Systems, CogSys 2010, ETH Zurich, January 2010.

• Starzyk J.A., Raif P., Motivated Learning Based on Goal Creation in Cognitive Systems, Thirteenth International Conference on Cognitive and Neural Systems, Boston University, May 2009.

• J. A. Starzyk, Motivation in Embodied Intelligence, Frontiers in Robotics, Automation and Control, I-Tech Education and Publishing, Oct. 2008, pp. 83-110.

Documents

IJCNN, International Joint Conference on Neural Networks , San Jose 2011