Reframing Control as an Inference...

Exploration (Part 2)

CS 285

Instructor: Sergey LevineUC Berkeley

Recap: what’s the problem?

this is easy (mostly) this is impossible

Unsupervised learning of diverse behaviors

What if we want to recover diverse behavior without any reward function at all?

➢ Learn skills without supervision, then use them to accomplish goals

➢ Learn sub-skills to use with hierarchical reinforcement learning

➢Explore the space of possible behaviors

An Example Scenario

training time: unsupervised

How can you prepare for an unknown future goal?

In this lecture…

➢ Definitions & concepts from information theory

➢ Learning without a reward function by reaching goals

➢ A state distribution-matching formulation of reinforcement learning

➢ Is coverage of valid states a good exploration objective?

➢ Beyond state covering: covering the space of skills

In this lecture…

Some useful identities

Information theoretic quantities in RL

quantifies coverage

can be viewed as quantifying “control authority” in an information-theoretic way

In this lecture…

An Example Scenario

training time: unsupervised

How can you prepare for an unknown future goal?

Learn without any rewards at all

(but there are many other choices)

Nair*, Pong*, Bahl, Dalal, Lin, L. Visual Reinforcement Learning with Imagined Goals. ’18Dalal*, Pong*, Lin*, Nair, Bahl, Levine. Skew-Fit: State-Covering Self-Supervised Reinforcement Learning. ‘19

How do we get diverse goals?

goals get higher entropy due to Skew-Fit

goal final state

Reinforcement learning withimagined goals

imagin

In this lecture…

Aside: exploration with intrinsic motivation

Can we use this for state marginal matching?

Lee*, Eysenbach*, Parisotto*, Xing, Levine, Salakhutdinov. Efficient Exploration via State Marginal MatchingSee also: Hazan, Kakade, Singh, Van Soest. Provably Efficient Maximum Entropy Exploration

MaxEnt on actions

variants of SMM

State marginal matching for exploration

much better coverage!

Lee*, Eysenbach*, Parisotto*, Xing, Levine, Salakhutdinov. Efficient Exploration via State Marginal MatchingSee also: Hazan, Kakade, Singh, Van Soest. Provably Efficient Maximum Entropy Exploration

In this lecture…

Is state entropy really a good objective?

more or less the same thing

Gupta, Eysenbach, Finn, Levine. Unsupervised Meta-Learning for Reinforcement Learning

See also: Hazan, Kakade, Singh, Van Soest. Provably Efficient Maximum Entropy Exploration

In this lecture…

➢ A distribution-matching formulation of reinforcement learning

Learning diverse skills

task index

Intuition: different skills should visit different state-space regions

Eysenbach, Gupta, Ibarz, Levine. Diversity is All You Need.

Reaching diverse goals is not the same as performing diverse tasks

not all behaviors can be captured by goal-reaching

Diversity-promoting reward function

Policy(Agent)

Discriminator(D)

Skill (z)

Environment

Action State

Predict Skill

CheetahAnt

Examples of learned tasks

Mountain car

A connection to mutual information

See also: Gregor et al. Variational Intrinsic Control. 2016

Reframing Control as an Inference...

Documents

Conversational Reframing Acknowledgements. Conversational Reframing Introduction

Reframing Climate Change:

Framing & Reframing

Reframing hybrid

Bread FRAME REFRAMING

Croutons reframing bread

REFRAMING - American Englishamericanenglish.state.gov/files/ae/resource_files/reframing-power... · Wherecanweusereframing? In the classroom With friends and family +

Reframing the Standards

Reframing Control as an Inference Problemrail.eecs.berkeley.edu/deeprlcourse-fa19/static/slides/... · 2020. 8. 23. · Reframing Control as an Inference Problem CS 285: Deep Reinforcement

Reframing Organizations

Reframing Disaster Conference 28 29 November, Leedspostcolonialdisaster.com/.../11/Reframing-Disaster-Abstract-Booklet.… · Reframing Disaster Conference Abstracts Friday 28 November,

Reframing bread

Reframing Evangelism

Reframing your

Reframing 9:11

Reframing Judaism

Reframing Research Administration

Reframing Control as an Inference Problemrail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-15.pdfSoft optimality suggested readings •Todorov. (2006). Linearly solvable Markov

Reframing Es

Reframing Jesus