Journal article icon

Journal article

VariBAD: variational bayes-adaptive deep RL via meta-learning

Abstract:
Trading off exploration and exploitation in an unknown environment is key to maximising expected online return during learning. A Bayes-optimal policy, which does so optimally, conditions its actions not only on the environment state but also on the agent's uncertainty about the environment. Computing a Bayes-optimal policy is however intractable for all but the smallest tasks. In this paper, we introduce variational Bayes-Adaptive Deep RL (variBAD), a way to meta-learn approximately Bayes-optimal policies for complex tasks. VariBAD simultaneously meta-learns a variational auto-encoder to perform approximate inference, and a policy that incorporates task uncertainty directly during action selection by conditioning on both the environment state and the approximate belief. In two toy domains, we illustrate how variBAD performs structured online exploration as a function of task uncertainty. We further evaluate variBAD on MuJoCo tasks widely used in meta-RL and show that it achieves higher online return than existing methods. On the recently proposed Meta-World ML1 benchmark, variBAD achieves state of the art results by a large margin, fully solving two out of the three ML1 tasks for the first time.
Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Publication website:
https://jmlr.org/papers/v22/21-0657.html

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Oxford college:
St Catherine's College
Role:
Author


Publisher:
Journal of Machine Learning Research
Journal:
Journal of Machine Learning Research More from this journal
Volume:
22
Issue:
289
Pages:
1-39
Publication date:
2021-11-21
Acceptance date:
2021-09-21
ISSN:
1532-4435


Language:
English
Keywords:
Pubs id:
1215279
Local pid:
pubs:1215279
Deposit date:
2021-12-02

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP