Risk-averse Bayes-adaptive reinforcement learning

Conference item

Abstract:: In this work, we address risk-averse Bayes-adaptive reinforcement learning. We pose the problem of optimising the conditional value at risk (CVaR) of the total return in Bayes-adaptive Markov decision processes (MDPs). We show that a policy optimising CVaR in this setting is risk-averse to both the epistemic uncertainty due to the prior distribution over MDPs, and the aleatoric uncertainty due to the inherent stochasticity of MDPs. We reformulate the problem as a two-player stochastic game and propose an approximate algorithm based on Monte Carlo tree search and Bayesian optimisation. Our experiments demonstrate that our approach significantly outperforms baseline approaches for this problem.

Files:: Rigter_et_al_2022_Risk-averse_Bayes-adaptive_reinforcement.pdf

(Preview, Accepted manuscript, 2.0MB, Terms of use)

Publication website:: https://proceedings.neurips.cc/paper/2021/hash/08f90c1a417155361a5c4b8d297e0d78-Abstract.html

Publisher:: Neural Information Processing Systems Foundation
Host title:: Advances in Neural Information Processing Systems 34 (NeurIPS 2021)
Volume:: 34
Pages:: 1142-1154
Publication date:: 2021-12-06
Acceptance date:: 2021-07-17
Event title:: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)
Event location:: Virtual event
Event website:: https://nips.cc/Conferences/2021
Event start date:: 2021-12-06
Event end date:: 2021-12-14

Notes:: This is the accepted manuscript version of the paper. The final version is available from the Neural Information Processing Systems Foundation at: https://proceedings.neurips.cc/paper/2021/hash/08f90c1a417155361a5c4b8d297e0d78-Abstract.html

Licence:: Terms and Conditions of Use for Oxford University Research Archive

If you are the owner of this record, you can report an update to it here: Report update to this record