Conference item
Average-reward off-policy policy evaluation with function approximation
- Abstract:
- We consider off-policy policy evaluation with function approximation (FA) in average-reward MDPs, where the goal is to estimate both the reward rate and the differential value function. For this problem, bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly triad (Sutton & Barto, 2018). To address the deadly triad, we propose two novel algorithms, reproducing the celebrated success of Gradient TD algorithms in the average-reward setting. In terms of estimating the differential value function, the algorithms are the first convergent off-policy linear function approximation algorithms. In terms of estimating the reward rate, the algorithms are the first convergent off-policy linear function approximation algorithms that do not require estimating the density ratio. We demonstrate empirically the advantage of the proposed algorithms, as well as their nonlinear variants, over a competitive density-ratio-based approach, in a simple domain as well as challenging robot simulation tasks.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Supplementary materials, 951.6KB, Terms of use)
-
(Preview, Version of record, 882.8KB, Terms of use)
-
- Publication website:
- http://proceedings.mlr.press/v139/zhang21u.html
Authors
- Publisher:
- PMLR
- Host title:
- Proceedings of the 38th International Conference on Machine Learning
- Volume:
- 139
- Pages:
- 12578-12588
- Series:
- Proceedings of Machine Learning Research
- Publication date:
- 2021-07-21
- Acceptance date:
- 2021-05-08
- Event title:
- 38th International Conference on Machine Learning (ICML 2021)
- Event location:
- Virtual Event
- Event website:
- https://icml.cc/
- Event start date:
- 2021-07-18
- Event end date:
- 2021-07-24
- ISSN:
-
2640-3498
- Language:
-
English
- Keywords:
- Pubs id:
-
1187447
- Local pid:
-
pubs:1187447
- Deposit date:
-
2021-07-24
Terms of use
- Copyright holder:
- Zhang et al.
- Copyright date:
- 2021
- Rights statement:
- Copyright 2021 by the author(s).
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record