Conference item icon

Conference item

Risk-averse Bayes-adaptive reinforcement learning

Abstract:
In this work, we address risk-averse Bayes-adaptive reinforcement learning. We pose the problem of optimising the conditional value at risk (CVaR) of the total return in Bayes-adaptive Markov decision processes (MDPs). We show that a policy optimising CVaR in this setting is risk-averse to both the epistemic uncertainty due to the prior distribution over MDPs, and the aleatoric uncertainty due to the inherent stochasticity of MDPs. We reformulate the problem as a two-player stochastic game and propose an approximate algorithm based on Monte Carlo tree search and Bayesian optimisation. Our experiments demonstrate that our approach significantly outperforms baseline approaches for this problem.
Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author


Publisher:
Neural Information Processing Systems Foundation
Host title:
Advances in Neural Information Processing Systems 34 (NeurIPS 2021)
Volume:
34
Pages:
1142-1154
Publication date:
2021-12-06
Acceptance date:
2021-07-17
Event title:
35th Conference on Neural Information Processing Systems (NeurIPS 2021)
Event location:
Virtual event
Event website:
https://nips.cc/Conferences/2021
Event start date:
2021-12-06
Event end date:
2021-12-14


Language:
English
Keywords:
Pubs id:
1242852
Local pid:
pubs:1242852
Deposit date:
2022-03-09

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP