Conference item icon

Conference item

Alternating optimisation and quadrature for robust control

Abstract:
Bayesian optimisation has been successfully applied to a variety of reinforcement learning problems. However, the traditional approach for learning optimal policies in simulators does not utilise the opportunity to improve learning by adjusting certain environment variables: state features that are unobservable and randomly determined by the environment in a physical setting but are controllable in a simulator. This paper considers the problem of finding a robust policy while taking into account the impact of environment variables. We present Alternating Optimisation and Quadrature (ALOQ), which uses Bayesian optimisation and Bayesian quadrature to address such settings. ALOQ is robust to the presence of significant rare events, which may not be observable under random sampling, but play a substantial role in determining the optimal policy. Experimental results across different domains show that ALOQ can learn more efficiently and robustly than existing methods.
Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Files:

Authors

More by this author
Institution:
University of Oxford
Division:
MPLS Division
Department:
Computer Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS Division
Department:
Engineering Science
Role:
Author


Publisher:
AAAI Press
Host title:
32nd AAAI Conference on Artificial Intelligence (AAAI'18)
Journal:
32nd AAAI Conference on Artificial Intelligence (AAAI'18) More from this journal
Pages:
3925-3933
Publication date:
2018-04-29
Acceptance date:
2017-11-09
ISSN:
2159-5399


Keywords:
Pubs id:
pubs:745008
UUID:
uuid:abd7c997-b0fb-4e66-b601-f82184500cbf
Local pid:
pubs:745008
Source identifiers:
745008
Deposit date:
2017-11-11
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP