A baseline for any order gradient estimation in stochastic computation graphs

Mao, J; Foerster, J; Rocktaschel, T; Al-Shedivat, M; Farquhar, G; Whiteson, S

Conference item

A baseline for any order gradient estimation in stochastic computation graphs

Abstract:: By enabling correct differentiation in Stochastic Computation Graphs (SCGs), the infinitely differentiable Monte-Carlo estimator (DiCE) can generate correct estimates for the higher order gradients that arise in, e.g., multi-agent reinforcement learning and meta-learning. However, the baseline term in DiCE that serves as a control variate for reducing variance applies only to first order gradient estimation, limiting the utility of higher-order gradient estimates. To improve the sample efficiency of DiCE, we propose a new baseline term for higher order gradient estimation. This term may be easily included in the objective, and produces unbiased variance-reduced estimators under (automatic) differentiation, without affecting the estimate of the objective itself or of the first order gradient estimate. It reuses the same baseline function (e.g., the state-value function in reinforcement learning) already used for the first order baseline. We provide theoretical analysis and numerical evaluations of this new baseline, which demonstrate that it can dramatically reduce the variance of DiCEâ€™s second order gradient estimators and also show empirically that it reduces the variance of third and fourth order gradients. This computational tool can be easily used to estimate higher order gradients with unprecedented efficiency and simplicity wherever automatic differentiation is utilised, and it has the potential to unlock applications of higher order gradients in reinforcement learning and meta-learning.

Publication status:: Published

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Cite

Cite this record

APA Style

Mao, J., Foerster, J., Rocktaschel, T., Al-Shedivat, M., Farquhar, G., & Whiteson, S. (2019). A baseline for any order gradient estimation in stochastic computation graphs. Proceedings of Machine Learning Research, 97, 4343–4351.

MLA Style

Mao, J., et al. “A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs.” Proceedings of Machine Learning Research, vol. 97, Proceedings of Machine Learning Research, 2019, pp. 4343–51.

Chicago Style

Mao, J, J Foerster, T Rocktaschel, M Al-Shedivat, G Farquhar, and S Whiteson. 2019. “A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs.” In Proceedings of Machine Learning Research, 97:4343–51. Proceedings of Machine Learning Research.
Share
Print

Access Document

Files:: maoicml19.pdf

(Preview, Version of record, pdf, 656.1KB, Terms of use)

Authors

+ Mao, J More by this author

Role:: Author

+ Foerster, J More by this author

Institution:: University of Oxford
Division:: MPLS Division
Department:: Computer Science
Role:: Author

+ Rocktaschel, T More by this author

Role:: Author

+ Al-Shedivat, M More by this author

Role:: Author

+ Farquhar, G More by this author

Institution:: University of Oxford
Division:: MPLS Division
Department:: Computer Science
Role:: Author

More authors...

+ European Research Council More from this funder

Grant:: 637713

+ NVIDIA More from this funder

Publisher:: Proceedings of Machine Learning Research
Host title:: Proceedings of Machine Learning Research
Journal:: Proceedings of Machine Learning Research More from this journal
Volume:: 97
Pages:: 4343-4351
Publication date:: 2019-05-24
Acceptance date:: 2019-05-14

Pubs id:: pubs:998015
UUID:: uuid:39df29a4-1e9a-4e06-a6a0-291dd673c682
Local pid:: pubs:998015
Source identifiers:: 998015
Deposit date:: 2019-05-14

Terms of use

Copyright holder:: Mao, J et al
Notes:: © Mao, J et al. Conference paper presented at the 36th International Conference on Machine Learning (ICML 2019), 10-15 June 2019, Long Beach, California. The final published version and supplementary materials are available online from Proceedings of Machine Learning Research at: http://proceedings.mlr.press/v97/mao19a.html

Licence:: Terms and Conditions of Use for Oxford University Research Archive

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Conference item

A baseline for any order gradient estimation in stochastic computation graphs

Actions

Access Document

Authors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Conference item

A baseline for any order gradient estimation in stochastic computation graphs

Actions

Access Document

Authors

Funding

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions