67
Reinforcement Learning of Context Models for Ubiquitous Computing Sofia ZAIDENBERG Laboratoire d’Informatique de Grenoble PRIMA Group 17/11/2010 1 PULSAR Research Meeting Under the supervision of Patrick REIGNIER and James L. CROWLEY

Reinforcement Learning of Context Models for Ubiquitous Computing

  • Upload
    elton

  • View
    27

  • Download
    0

Embed Size (px)

DESCRIPTION

Reinforcement Learning of Context Models for Ubiquitous Computing. Sofia Zaidenberg Laboratoire d’Informatique de Grenoble Prima Group . Under the supervision of Patrick Reignier and James L. Crowley. Ambient Computing. [Weiser, 1991] [Weiser, 1994] [Weiser and Brown, 1996]. - PowerPoint PPT Presentation

Citation preview

Page 1: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

1

Reinforcement Learningof Context Models

for Ubiquitous Computing

Sofia ZAIDENBERGLaboratoire d’Informatique de Grenoble

PRIMA Group

17/11/2010PULSAR Research Meeting

Under the supervision ofPatrick REIGNIER and James L. CROWLEY

Page 2: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Ambient Computing

17/11/2010

2 PULSAR Research Meeting

Ubiquitous Computing (ubicomp)

[Weiser, 1991][Weiser, 1994][Weiser and Brown, 1996]

Page 3: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

3 PULSAR Research Meeting 17/11/2010

Page 4: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

4 PULSAR Research Meeting 17/11/2010

Page 5: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Ambient Computing « autistic » devices

Independent Heterogeneous Unaware

Ubiquitous systems Accompany without imposing In periphery of the attention Invisible Calm computing

5 PULSAR Research Meeting 17/11/2010

Page 6: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

15 years later… Ubiquitous computing is already here [Bell and

Dourish, 2007] It’s not exactly like we expected

“Singapore, the intelligent island” “U-Korea”

Not seamless but messy Not invisible but flashy Characterized by improvisation and appropriation

Engaging user experiences [Rogers, 2006]

6 PULSAR Research Meeting 17/11/2010

Page 7: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

New directions Study habits and ways of living of people and

create technologies based on that (and not the opposite)[Barton and Pierce, 2006; Pascoe et al., 2007; Taylor et al., 2007; Jose, 2008]

Redefine smart technologies [Rogers, 2006]

Our goal: Provide an AmI application assisting the user

in his everyday activities

7 PULSAR Research Meeting 17/11/2010

Page 8: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Our goal Context-aware computing +

Personalization Situation + user action

8 PULSAR Research Meeting

Alice

Bob

1. Perception

2. Decision

17/11/2010

Page 9: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Personalization by

Learning

Proposed solution

9 PULSAR Research Meeting 17/11/2010

Page 10: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Outline Problem statement Learning in ubiquitous systems User Study Ubiquitous system Reinforcement learning of a context model Experimentations and results Conclusion

10 PULSAR Research Meeting 17/11/2010

Page 11: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Proposed system A virtual assistant embodying the

ubiquitous system The assistant

Perceives the context using its sensors Executes actions using its actuators Receives user feedback for training Adapts its behavior to this feedback (learning)

11 PULSAR Research Meeting 17/11/2010

Page 12: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Constraints Simple training Fast learning Initial behavior consistency Life long learning User trust Transparence [Bellotti and Edwards, 2001]

Intelligibility System behavior understood by humans

Accountability System able to explain itself

12 PULSAR Research Meeting

The system is adapting to environment and preferences changes

17/11/2010

Page 13: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Example

13 PULSAR Research Meeting

J109 J120

hyperionReminder

!

17/11/2010

Page 14: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Outline Problem statement Learning in ubiquitous systems User Study Ubiquitous system Reinforcement learning of a context model Experimentations and results Conclusion

14 PULSAR Research Meeting 17/11/2010

Page 15: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

User study Why a general public user study?

Original Weiser’s vision of calm computing revealed itself to be unsuited to current user needs

Objective Evaluate the expectations and needs vis-à-vis

“ambient computing” and its usages

15 PULSAR Research Meeting 17/11/2010

Page 16: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Terms of the study 26 interviewed subjects

Non-experts Distributed as follows:

16 PULSAR Research Meeting 17/11/2010

Sex

womenmentotal

Age group

18-2526-4040-6060+

Page 17: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Terms of the study 1 hour interviews with open discussion and

support of an interactive model Questions about advantages and drawbacks

of a ubiquitous assistant User ideas about interesting and useful

usages

17 PULSAR Research Meeting 17/11/2010

Page 18: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Results 44 % of subjects interested, 13 % conquered Profiles of interested subjects:

Very busy people Experiencing cognitive overload

Leaning considered as a plus More reliable systemGradual training vs. heavy configurationSimple and pleasant training (“one click”)

18 PULSAR Research Meeting 17/11/2010

Page 19: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

ResultsShort learning phaseExplanations are essential

Interactions Depending on the subject Optional debriefing phase

Mistakes accepted as long as the consequences are not critical

Use control Reveals subconscious customs Worry of dependence

19 PULSAR Research Meeting 17/11/2010

Page 20: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

20

Conclusions Constraints

Not a black box [Bellotti and Edwards, 2001] Intelligibility and

accountability Simple, not intrusive training Short training period, fast re-adaptation to

preference changes Coherent initial behavior

Build a ubiquitous assistant based on these constraints

PULSAR Research Meeting 17/11/2010

Page 21: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Outline Problem statement Learning in ubiquitous systems User Study Ubiquitous system Reinforcement learning of a context model Experimentations and results Conclusion

21 PULSAR Research Meeting 17/11/2010

Page 22: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Ubiquitous system

22J109 J120

hyperion

Reminder!

Uses the existing devices

Heterogeneous

Scattered

System needs

OMiSCID

OSGi

Multiplatform systemDistributed systemCommunication protocolDynamic service discoveryEasily deployable

[Emonet et al., 2006]

PULSAR Research Meeting 17/11/2010

Page 23: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Modules interconnection

23 PULSAR Research Meeting

KEYBOARDACTIVITY

Sensors Actuators

EMAILS

LOCALIZATION

APPLICATIONS

TEXT TOSPEECH

REMOTECONTROL

PRESENCE

APPLICATIONS

EMAILS

17/11/2010

Page 24: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Example of message exchange

24J109 J120

hyperion

protee

Text2Speech on protee?

Yes!

OMiSCID

No…

Install and startText2Speech

Bundlerepository

PULSAR Research Meeting 17/11/2010

Page 25: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Database Contains

Static knowledge History of events and actions

To provide explanations

Centralized Queried Fed Simplifies queries

25 PULSAR Research Meeting

by all modules on all devices

17/11/2010

Page 26: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Outline Problem statement Learning in ubiquitous systems User Study Ubiquitous system Reinforcement learning of a context

model Reinforcement learning Applying reinforcement learning

Experimentations and results Conclusion

26 PULSAR Research Meeting 17/11/2010

Page 27: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Reminder: our constraints Simple training Fast learning Initial behavior consistency Life long learning Explanations

27 PULSAR Research Meeting

Supervised[Brdiczka et al., 2007]

17/11/2010

Page 28: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Reinforcement learning (RL)

28 PULSAR Research Meeting

Markov property The state at

time t depends only on the state at time t-1

17/11/2010

Page 29: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Standard algorithm Q-Learning [Watkins, 1989]

Updates Q-values on a new experience{state, action, next state, reward} Slow because evolution only when something

happens Needs a lot of examples to learn a behavior

29 PULSAR Research Meeting 17/11/2010

Page 30: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

DYNA architecture

31 PULSAR Research Meeting

Agent

World

World model

ActionRewardState

DYNASwitch

[Sutton, 1991]

17/11/2010

Page 31: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Policy

DYNA architecture

32 PULSAR Research Meeting

Environment

World model

Use

Update

UpdateUpdate

Policy

Realinteractions

17/11/2010

Page 32: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Global system

33 PULSAR Research Meeting

EnvironmentDatabase

State

Action

Reward?Example

Action

World model

Realinteractions

Update

Update

Policy

Use

Perception

ExampleReward

Policy

17/11/2010

Page 33: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Modeling of the problem Components:

States Actions

34 PULSAR Research Meeting

World model

Realinteractions

Use

Update

Update

Policy

Components: Transition model Reward model

17/11/2010

Page 34: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

State space States defined by predicates

Understandable by humans (explanations) Examples :

newEmail ( from = Marc, to = Bob ) isInOffice ( John )

State-action: entrance( ) Pause music

35 PULSAR Research Meeting

World model

Realinteractions

Use

Update

Update

Policy

Predicates

System predicat

es

Environment predicates

Karl<+>

17/11/2010

Page 35: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

State space State split

newEmail( from= directeur, to= <+> )Notify

newEmail(from = newsletter, to= <+> )Do not notify

36 PULSAR Research Meeting 17/11/2010

Page 36: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Action space Possible actions combine

Forwarding a reminder to the user Notify of a new email Lock a computer screen Unlock a computer screen Pause the music playing on a computer Un-pause the music playing on a computer Do nothing

38 PULSAR Research Meeting

World model

Realinteractions

Use

Update

Update

Policy

17/11/2010

Page 37: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Reward Explicit reward

Through a non-intrusive user interface Problems with user rewards

Implicit reward Gathered from clues

(numerical value of lower amplitude) Smoothing of the model

39 PULSAR Research Meeting

World model

Realinteractions

Use

Update

Update

Policy

17/11/2010

Page 38: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

World model Built using supervised learning

From real examples

Initialized using common sense Functional system from the beginning Initial model vs. initial Q-values [Kaelbling, 2004] Extensibility

40 PULSAR Research Meeting

World model

Realinteractions

Use

Update

Update

Politique

Rewardmodel

TransitionmodelRewardmodel

17/11/2010

Page 39: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Transition model

41 PULSAR Research Meeting

s1 s2Starting states

Action or event

Modifications

Rewardmodel

Transitionmodel

+Probability

17/11/2010

Page 40: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Supervised learning ofthe transition model Database of examples{previous state, action, next state}

42 PULSAR Research Meeting

World model

Realinteractions

Use

Update

Update

Policy

s s’

t2t1

t3

s’tn+117/11/2010

Page 41: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Global system

43 PULSAR Research Meeting

EnvironmentDatabase

State

Action

Reward?Example

Action

World model

Realinteractions

Update

Update

Policy

Use

Perception

ExampleReward

Policy

17/11/2010

Page 42: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Episode Episode steps have 2 stages:

Select an event that modifies the state Select an action to react to that event

44 PULSAR Research Meeting

World model

Realinteractions

Update

Policy

Use Update

17/11/2010

Page 43: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Episode

45 PULSAR Research Meeting

RL agent

Environment

World model

Database

Learned from real interactions

or

Q-Learning updates

World model

Realinteractions

Update

Policy

Use Update

Policy

Experience

Policy

17/11/2010

Page 44: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Outline Problem statement Learning in ubiquitous systems User Study Ubiquitous system Reinforcement learning of a context model Experimentations and results Conclusion

46 PULSAR Research Meeting 17/11/2010

Page 45: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Experimentations General public survey qualitative evaluation

Quantitative evaluations in 2 steps: Evaluation of the initial phase Evaluation of the system during normal

functioning

47 PULSAR Research Meeting 17/11/2010

Page 46: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Evaluation 1« about initial learning »

48 PULSAR Research Meeting 17/11/2010

Page 47: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Evaluation 1« about initial learning »

49 PULSAR Research Meeting

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100

0

10

20

30

40

50

60

Initial episodes with events and initial statesrandomly chosen from the database

102550100

Episodes

Grad

e

Number ofiterationsper episode

17/11/2010

Page 48: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Evaluation 2« interactions and learning »

50 PULSAR Research Meeting

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103

109

100

105

110

115

120

125

Episodes

Gra

de

17/11/2010

Page 49: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Evaluation 2« interactions and learning »

51 PULSAR Research Meeting

60 67 74 81 88 95 102 109 116 123 130 137 144 151 158 165 172 179 18688

89

90

91

92

93

94

95

96

97

Episodes

Gra

des

17/11/2010

Page 50: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Outline Problem statement Learning in ubiquitous systems User Study Ubiquitous system Reinforcement learning of a context model Experimentations and results Conclusion

52 PULSAR Research Meeting 17/11/2010

Page 51: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Contributions Personalization of a ubiquitous system

Without explicit specification Easy to evolve

Adaptation of indirect reinforcement learningto a real-world problem Construction of a world model Injection of initial knowledge

Deployment of a prototype

53 PULSAR Research Meeting 17/11/2010

Page 52: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Perspectives

Non-interactive analyze of data

User interactions Debriefing

54 PULSAR Research Meeting 17/11/2010

Page 53: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Conclusion

The assistant is a means of creatingan ambient intelligence application The user is the one making it smart

55 PULSAR Research Meeting 17/11/2010

Page 54: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Thanks for your attention

Questions?

56 PULSAR Research Meeting 17/11/2010

Page 55: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Bibliography[Bellotti and Edwards, 2001]

Victoria BELLOTTI and Keith EDWARDS. « Intelligibility and accountability: human considerations in context-aware systems ». In Human-Computer Interaction, 2001.

[Brdiczka et al., 2007]

Oliver BRDICZKA, James L. CROWLEY and Patrick REIGNIER. « Learning Situation Models for Providing Context-Aware Services ». In Proceedings of HCI International, 2007.

[Buffet, 2003] Olivier Buffet. « Une double approche modulaire de l'apprentissage par renforcement pour des agents intelligents adaptatifs ». Thèse de doctorat, Université Henri Poincaré, 2003.

[Emonet et al., 2006]

Rémi Emonet, Dominique Vaufreydaz, Patrick Reignier and Julien Letessier. « O3MiSCID: an Object Oriented Opensource Middleware for Service Connection, Introspection and Discovery ». In IEEE International Workshop on Services Integration in Pervasive Environments, 2006.

[Kaelbling, 2004] Leslie Pack Kaelbling. « Life-Sized Learning ». Lecture at CSE Colloquia, 2004.

[Maes, 1994] Pattie MAES. « Agents that reduce work and information overload ». In Commun. ACM, 1994.

[Maisonnasse 2007]

Jerome MAISONNASSE, Nicolas GOURIER, Patrick REIGNIER and James L. CROWLEY. « Machine awareness of attention for non-disruptive services ». In HCI International, 2007.

[Moore, 1975] Gordon E. MOORE. « Progress in digital integrated electronics ». In Proc. IEEE International Electron Devices Meeting,1975.57 PULSAR Research Meeting 17/11/201

0

Page 56: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Bibliography[Nonogaki and Ueda, 1991]

Hajime Nonogaki and Hirotada Ueda. « FRIEND21project: a construction of 21st century human interface ». In CHI '91: Proceedings of the SIGCHI conference on Human factors in computing systems, 1991.

[Roman et al., 2002]

Manuel ROMAN, Christopher K. HESS, Renato CERQUEIRA,Anand RANGANATHAN, Roy H. CAMPBELL and Klara NAHRSTEDT. « Gaia: A Middleware Infrastructure to Enable Active Spaces ». In IEEE Pervasive Computing, 2002.

[Sutton, 1991] Richard S. Sutton. « Dyna, an integrated architecture for learning, planning, and reacting ». In SIGART Bull, 1991.

[Weiser, 1991] Mark WEISER. « The computer for the 21st century ». In Scientic American, 1991.

[Weiser, 1994] Mark WEISER. « Some computer science issues in ubiquitous computing ». In Commun. ACM, 1993.

[Weiser et Brown, 1996]

Mark WEISER and John Seely BROWN. « The coming age of calm technology ». http://www.ubiq.com/hypertext/weiser/acmfuture2endnote.htm, 1996.

[Watkins, 1989] CJCH Watkins. « Learning from Delayed Rewards ». Thèse de doctorat, University of Cambridge, 1989.

58 PULSAR Research Meeting 17/11/2010

Page 57: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Interconnexion des modules

65 PULSAR Research Meeting

ACTIVITÉ

CLAVIER

Capteurs Actionneurs

EMAILS

LOCALISATION

PRÉSENCE

APPLICATIONS

APPLICATIONS

EMAILSSYNTHÈSE

VOCALE

CONTRÔLE

DISTANT

17/11/2010

Page 58: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Service OMiSCID

66 PULSAR Research Meeting 17/11/2010

Page 59: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Définition d’un état

67 PULSAR Research Meeting

Prédicat Argumentsalarm title, hour, minutexActivity machine, isActiveinOffice user, officeabsent userhasUnreadMail from, to, subject, bodyentrance isAlone, friendlyName, btAddressexit isAlone, friendlyName, btAddresstask taskNameuser loginuserOffice office, loginuserMachine machine, logincomputerState machine, isScreenLocked, isMusicPaused

17/11/2010

Page 60: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Modèle de l’environnement

68 PULSAR Research Meeting

ModèleÉtat E[état suivant]

E[renforcement]

ActionÉvénement

ou

17/11/2010

Page 61: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Réduction de l’espace d’états Accélération de l’apprentissage

Factorisation d’états

Division d’états

69 PULSAR Research Meeting

État Action Q-valeur

…entrance(isAlone=true, friendlyName=<+>, btAddress=<+>)…

pauseMusic 125.3

État Action Q-valeur

…hasUnreadMail(from=boss, to=<+>,subject=<+>, body=<+>)…

inform 144.02

…hasUnreadMail(from=newsletter, to=<+>, subject=<+>, body=<+>)…

notInform 105

Jokers<*> et <+>

17/11/2010

Page 62: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Le simulateur de l’environnement

70 PULSAR Research Meeting 17/11/2010

Page 63: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Critère d’évaluation : la note Résultat de l’AR : une Q-table Comment savoir si elle est « bonne » ? Apprentissage réussi si

Comportement correspond aux souhaits de l’utilisateur

Et c’est mieux si on a beaucoup exploré et si on a une estimation du comportement dans beaucoup d’états

71 PULSAR Research Meeting 17/11/2010

Page 64: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

« Le tableau de bord »

72 PULSAR Research Meeting

Permet d’envoyer par un clic les mêmes événements que les capteurs

17/11/2010

Page 65: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Modèle de récompense Ensemble d’entrées spécifiant

Des contraintes sur certains arguments de l’état Une action La récompense

73 PULSAR Research Meeting 17/11/2010

Page 66: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Modèle de récompense

74 PULSAR Research Meeting

s1

-50

États de départ

Action

Récompense

Modèle de transitionModèle de

récompense

17/11/2010

Page 67: Reinforcement  Learning of  Context  Models for  Ubiquitous Computing

Apprentissage supervisédu modèle de récompense La base de données contient des exemples{état précédent, action, récompense}

75 PULSAR Research Meeting

Modèle du monde

Interactionsréelles

Utilisation

Mise-à-jour

Mise-à-jour

Politique

s a rsss

a r

s a r

e1

en+117/11/2010