GradientDICE: rethinking generalized offline estimation of stationary values

Zhang, S; Liu, B; Whiteson, S

Conference item

GradientDICE: rethinking generalized offline estimation of stationary values

Abstract:: We present GradientDICE for estimating the density ratio between the state distribution of the target policy and the sampling distribution in off-policy reinforcement learning. GradientDICE fixes several problems of GenDICE (Zhang et al., 2020), the current state-of-the-art for estimating such density ratios. Namely, the optimization problem in GenDICE is not a convex-concave saddle-point problem once nonlinearity in optimization variable parameterization is introduced to ensure positivity, so primal-dual algorithms are not guaranteed to find the desired solution. However, such nonlinearity is essential to ensure the consistency of GenDICE even with a tabular representation. This is a fundamental contradiction, resulting from GenDICE’s original formulation of the optimization problem. In GradientDICE, we optimize a different objective from GenDICE by using the Perron-Frobenius theorem and eliminating GenDICE’s use of divergence, such that nonlinearity in parameterization is not necessary for GradientDICE, which is provably convergent under linear function approximation.

Publication status:: Published

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Cite

Cite this record

APA Style

Zhang, S., Liu, B., & Whiteson, S. (2020). GradientDICE: rethinking generalized offline estimation of stationary values. International Conference on Machine Learning, 13-18 July 2020, Virtual.

MLA Style

Zhang, S., et al. “GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values.” International Conference on Machine Learning, 13-18 July 2020, Virtual, Journal of Machine Learning Research, 2020.

Chicago Style

Zhang, S, B Liu, and S Whiteson. 2020. “GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values.” In International Conference on Machine Learning, 13-18 July 2020, Virtual. Proceedings of Machine Learning Research. Journal of Machine Learning Research.
Share
Print

Access Document

Files:: zhangicml20a.pdf

(Preview, Accepted manuscript, 1.4MB, Terms of use)

Publication website:: http://proceedings.mlr.press/v119/zhang20r.html

Authors

+ Zhang, S More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Computer Science
Role:: Author

+ Liu, B More by this author

Role:: Author

+ Whiteson, S More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Computer Science
Role:: Author

+ European Commission More from this funder

Grant:: 637713

Publisher:: Journal of Machine Learning Research
Host title:: International Conference on Machine Learning, 13-18 July 2020, Virtual
Series:: Proceedings of Machine Learning Research
Series number:: 119
Publication date:: 2020-11-21
Acceptance date:: 2020-06-01
Event title:: 37th International Conference on Machine Learning (ICML 2020)
Event location:: Virtual
Event website:: https://icml.cc/Conferences/2020
Event start date:: 2020-07-12
Event end date:: 2020-07-18
ISSN:: 2640-3498

Language:: English
Keywords:: FFR
Pubs id:: 1118780
Local pid:: pubs:1118780
Deposit date:: 2020-07-15

Terms of use

Copyright holder:: Zhang, s et al.
Notes:: This paper was presented at the 37th International Conference on Machine Learning (ICML 2020), 12-18 July 2020. This is the accepted manuscript version of the paper. The final version is available online from PMLR at: http://proceedings.mlr.press/v119/zhang20r.html

Licence:: Terms and Conditions of Use for Oxford University Research Archive

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Conference item

GradientDICE: rethinking generalized offline estimation of stationary values

Actions

Access Document

Authors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Conference item

GradientDICE: rethinking generalized offline estimation of stationary values

Actions

Access Document

Authors

Funding

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions