Conference item
Risk-averse Bayes-adaptive reinforcement learning
- Abstract:
- In this work, we address risk-averse Bayes-adaptive reinforcement learning. We pose the problem of optimising the conditional value at risk (CVaR) of the total return in Bayes-adaptive Markov decision processes (MDPs). We show that a policy optimising CVaR in this setting is risk-averse to both the epistemic uncertainty due to the prior distribution over MDPs, and the aleatoric uncertainty due to the inherent stochasticity of MDPs. We reformulate the problem as a two-player stochastic game and propose an approximate algorithm based on Monte Carlo tree search and Bayesian optimisation. Our experiments demonstrate that our approach significantly outperforms baseline approaches for this problem.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Authors
- Publisher:
- Neural Information Processing Systems Foundation
- Host title:
- Advances in Neural Information Processing Systems 34 (NeurIPS 2021)
- Volume:
- 34
- Pages:
- 1142-1154
- Publication date:
- 2021-12-06
- Acceptance date:
- 2021-07-17
- Event title:
- 35th Conference on Neural Information Processing Systems (NeurIPS 2021)
- Event location:
- Virtual event
- Event website:
- https://nips.cc/Conferences/2021
- Event start date:
- 2021-12-06
- Event end date:
- 2021-12-14
- Language:
-
English
- Keywords:
- Pubs id:
-
1242852
- Local pid:
-
pubs:1242852
- Deposit date:
-
2022-03-09
Terms of use
- Copyright date:
- 2021
- Notes:
- This is the accepted manuscript version of the paper. The final version is available from the Neural Information Processing Systems Foundation at: https://proceedings.neurips.cc/paper/2021/hash/08f90c1a417155361a5c4b8d297e0d78-Abstract.html
If you are the owner of this record, you can report an update to it here: Report update to this record