Conference item
Bayesian Bellman operators
- Abstract:
- We introduce a novel perspective on Bayesian reinforcement learning (RL); whereas existing approaches infer a posterior over the transition distribution or Q-function, we characterise the uncertainty in the Bellman operator. Our Bayesian Bellman operator (BBO) framework is motivated by the insight that when bootstrapping is introduced, model-free approaches actually infer a posterior over Bellman operators, not value functions. In this paper, we use BBO to provide a rigorous theoretical analysis of model-free Bayesian RL to better understand its relationship to established frequentist RL methodologies. We prove that Bayesian solutions are consistent with frequentist RL solutions, even when approximate inference is used, and derive conditions for which convergence properties hold. Empirically, we demonstrate that algorithms derived from the BBO framework have sophisticated deep exploration properties that enable them to solve continuous control tasks at which state-of-the-art regularised actor-critic algorithms fail catastrophically.
- Publication status:
- Published
- Peer review status:
- Reviewed (other)
Actions
Authors
- Publisher:
- NeurIPS
- Journal:
- NeurIPS Proceedings 2021 More from this journal
- Volume:
- 34
- Pages:
- 13641-13656
- Publication date:
- 2022-04-01
- Acceptance date:
- 2021-11-01
- Event title:
- 35th Annual Conference on Neural Information Processing Systems (NeurIPS 2021)
- Language:
-
English
- Keywords:
- Pubs id:
-
1211839
- Local pid:
-
pubs:1211839
- Deposit date:
-
2021-11-23
Terms of use
- Copyright holder:
- Fellows et al.
- Copyright date:
- 2022
- Rights statement:
- Copyright © 2022 The Author(s).
- Notes:
-
This is the accepted manuscript version of the article. The final version is available from NeurIPS at https://proceedings.neurips.cc/paper/2021/hash/7180cffd6a8e829dacfc2a31b3f72ece-Abstract.html
If you are the owner of this record, you can report an update to it here: Report update to this record