Conference item
Fingerprint policy optimisation for robust reinforcement learning
- Abstract:
-
Policy gradient methods ignore the potential value of adjusting environment variables: unobservable state features that are randomly determined by the environment in a physical setting, but are controllable in a simulator. This can lead to slow learning, or convergence to suboptimal policies, if the environment variable has a large impact on the transition dynamics. In this paper, we present fingerprint policy optimisation (FPO), which finds a policy that is optimal in expectation across the ...
Expand abstract
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Version of record, pdf, 144.1KB)
-
(Version of record, pdf, 608.1KB)
-
Authors
Funding
Bibliographic Details
- Publisher:
- Journal of Machine Learning Research Publisher's website
- Journal:
- Thirty-Sixth International Conference on Machine Learning (ICML 2019) Journal website
- Volume:
- 97
- Pages:
- 5082-5091
- Series:
- Machine Learning
- Host title:
- Proceedings of Machine Learning Research
- Publication date:
- 2019-06-11
- Acceptance date:
- 2019-05-14
- ISSN:
-
2640-3498
- Source identifiers:
-
998018
Item Description
- Pubs id:
-
pubs:998018
- UUID:
-
uuid:0bd8f1b9-236f-4348-90a8-8bf5fbd77d85
- Local pid:
- pubs:998018
- Deposit date:
- 2019-05-14
Terms of use
- Copyright holder:
- Paul et al
- Copyright date:
- 2019
- Notes:
- © The Author(s) 2019. This paper was presented at the 36th International Conference on Machine Learning (ICML 2019), Long Beach, California, USA, June 2019. The final published version and supplementary materials are available online from Proceedings of Machine Learning Research at: http://proceedings.mlr.press/v97/paul19a.html
If you are the owner of this record, you can report an update to it here: Report update to this record