Conference item icon

Conference item

Fingerprint policy optimisation for robust reinforcement learning

Abstract:

Policy gradient methods ignore the potential value of adjusting environment variables: unobservable state features that are randomly determined by the environment in a physical setting, but are controllable in a simulator. This can lead to slow learning, or convergence to suboptimal policies, if the environment variable has a large impact on the transition dynamics. In this paper, we present fingerprint policy optimisation (FPO), which finds a policy that is optimal in expectation across the ...

Expand abstract
Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Files:

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS Division
Department:
Computer Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS Division
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Oxford college:
St Catherine's College
Role:
Author
Publisher:
Journal of Machine Learning Research Publisher's website
Journal:
Thirty-Sixth International Conference on Machine Learning (ICML 2019) Journal website
Volume:
97
Pages:
5082-5091
Series:
Machine Learning
Host title:
Proceedings of Machine Learning Research
Publication date:
2019-06-11
Acceptance date:
2019-05-14
ISSN:
2640-3498
Source identifiers:
998018
Pubs id:
pubs:998018
UUID:
uuid:0bd8f1b9-236f-4348-90a8-8bf5fbd77d85
Local pid:
pubs:998018
Deposit date:
2019-05-14

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP