View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Outline
•Dynamic Treatment Regimes
•Optimal Q-functions and Q-learning
•The Problem & Goal
•Finite Sample Bounds
•Outline of Proof
•Shortcomings and Open Problems
---- Multi-stage decision problems: repeated decisions are made over time on each patient.
---- Used in the management of Addictions, Mental Illnesses, HIV infection and Cancer
Dynamic Treatment Regimes
A dynamic treatment regime is a vector of decision rules, one per decision
If the regime is implemented then
Goal: Estimate the decision rules that maximize mean
Data: Data set of n finite horizon trajectories, each with randomized actions.
are randomization probabilities.
Optimal Q-functions and Q-learning:
Definition:
denotes expectation when the actions are chosen according to the regime
The Problem & Goal:
Most learning (e.g. estimation) methods utilize a model for all or parts of the multivariate distribution of
implicitly constrains the class of possible decision rules in the dynamic treatment regime: call this constrained class,
is a vector with many components (high dimensional) thus the model is likely incorrect; view and as approximation classes.
Goal: Given a learning method and approximation classes
assess the ability of learning method to produce the best decision rules in the class.
Ideally construct an upper bound for
where is the estimator of the regime
denotes expectation when the actions are chosen according to the rule
Goal: Given a learning method, model and approximation class construct a finite sample upper bound for
This upper bound should be composed of quantities that are minimized in the learning method.
Learning Method is Q-learning.
Open Problems
• Is there a learning method that can learn the best decision rule in an approximation class given a data set of n finite horizon trajectories?
• Sieve Estimators or Regularized Estimators?
• Dealing with high dimensional X-- feature extraction---feature selection.
This seminar can be found at:
http://www.stat.lsa.umich.edu/~samurphy/seminars/ims_bernoulli_0704.ppt
The paper can be found at :
http://www.stat.lsa.umich.edu/~samurphy/papers/Qlearning.pdf