Conference item icon

Conference item

Fingerprint policy optimisation for robust reinforcement learning

Abstract:

Policy gradient methods ignore the potential value of adjusting environment variables: unobservable state features that are randomly determined by the environment in a physical setting, but are controllable in a simulator. This can lead to slow learning, or convergence to suboptimal policies, if the environment variable has a large impact on the transition dynamics. In this paper, we present fingerprint policy optimisation (FPO), which finds a policy that is optimal in expectation across the ...

Expand abstract
Publication status:
Published
Peer review status:
Peer reviewed
Version:
Publisher's Version

Actions


Access Document


Files:

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS Division
Department:
Computer Science
More by this author
Institution:
University of Oxford
Division:
MPLS Division
Department:
Engineering Science
More by this author
Institution:
University of Oxford
Division:
MPLS Division
Department:
Computer Science
Oxford college:
St Catherines College
Publisher:
Journal of Machine Learning Research Publisher's website
Volume:
97
Pages:
5082-5091
Publication date:
2019-06-11
Acceptance date:
2019-05-14
ISSN:
2640-3498
Pubs id:
pubs:998018
URN:
uri:0bd8f1b9-236f-4348-90a8-8bf5fbd77d85
UUID:
uuid:0bd8f1b9-236f-4348-90a8-8bf5fbd77d85
Local pid:
pubs:998018

Terms of use


Metrics



If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP