Conference item
GradientDICE: rethinking generalized offline estimation of stationary values
- Abstract:
- We present GradientDICE for estimating the density ratio between the state distribution of the target policy and the sampling distribution in off-policy reinforcement learning. GradientDICE fixes several problems of GenDICE (Zhang et al., 2020), the current state-of-the-art for estimating such density ratios. Namely, the optimization problem in GenDICE is not a convex-concave saddle-point problem once nonlinearity in optimization variable parameterization is introduced to ensure positivity, so primal-dual algorithms are not guaranteed to find the desired solution. However, such nonlinearity is essential to ensure the consistency of GenDICE even with a tabular representation. This is a fundamental contradiction, resulting from GenDICE’s original formulation of the optimization problem. In GradientDICE, we optimize a different objective from GenDICE by using the Perron-Frobenius theorem and eliminating GenDICE’s use of divergence, such that nonlinearity in parameterization is not necessary for GradientDICE, which is provably convergent under linear function approximation.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, 1.4MB, Terms of use)
-
- Publication website:
- http://proceedings.mlr.press/v119/zhang20r.html
Authors
- Publisher:
- Journal of Machine Learning Research
- Host title:
- International Conference on Machine Learning, 13-18 July 2020, Virtual
- Series:
- Proceedings of Machine Learning Research
- Series number:
- 119
- Publication date:
- 2020-11-21
- Acceptance date:
- 2020-06-01
- Event title:
- 37th International Conference on Machine Learning (ICML 2020)
- Event location:
- Virtual
- Event website:
- https://icml.cc/Conferences/2020
- Event start date:
- 2020-07-12
- Event end date:
- 2020-07-18
- ISSN:
-
2640-3498
- Language:
-
English
- Keywords:
- Pubs id:
-
1118780
- Local pid:
-
pubs:1118780
- Deposit date:
-
2020-07-15
Terms of use
- Copyright holder:
- Zhang, s et al.
- Copyright date:
- 2020
- Rights statement:
- © 2020 The Authors.
- Notes:
- This paper was presented at the 37th International Conference on Machine Learning (ICML 2020), 12-18 July 2020. This is the accepted manuscript version of the paper. The final version is available online from PMLR at: http://proceedings.mlr.press/v119/zhang20r.html
If you are the owner of this record, you can report an update to it here: Report update to this record