AI Collection

Journal article

Fast policy learning for linear-quadratic control with entropy regularization

Abstract:: This paper proposes and analyzes two new policy learning methods, regularized policy gradient and iterative policy optimization (IPO), for a class of discounted linear-quadratic control (LQC) problems over an infinite time horizon with entropy regularization. Assuming access to the exact policy evaluation, both proposed approaches are proved to converge linearly in finding optimal policies of the regularized LQC. Moreover, the IPO method can achieve a superlinear convergence rate once it enters a local region around the optimal policy. Finally, when the optimal policy for a reinforcement learning (RL) problem with a known environment is appropriately transferred as the initial policy to an RL problem with an unknown environment, the IPO method is shown to converge at a superlinear rate if the two environments are sufficiently close. A model-free version of the policy-based methods is also discussed. Performances of these proposed algorithms are supported by numerical examples.

Publication status:: Published

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Guo, X., Li, X., & Xu, R. (2026). Fast policy learning for linear-quadratic control with entropy regularization. SIAM Journal on Control and Optimization, 64(1), 124–151.

MLA Style

Guo, X, et al. “Fast Policy Learning for Linear-Quadratic Control with Entropy Regularization.” SIAM Journal on Control and Optimization, vol. 64, no. 1, 2026, pp. 124–51.

Chicago Style

Guo, X, X Li, and R Xu. 2026. “Fast Policy Learning for Linear-Quadratic Control with Entropy Regularization.” SIAM Journal on Control and Optimization 64 (1): 124–51.
Print

Access Document

Publisher copy:: 10.1137/23m1621071

Authors

+ Guo, X More by this author

Role:: Author

+ Li, X More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Mathematical Institute
Role:: Author

+ Xu, R More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Mathematical Institute
Oxford college:: The Queen's College
Role:: Author
ORCID:: 0000-0003-4293-3450

+ Engineering and Physical Sciences Research Council More from this funder

Funder identifier:: https://ror.org/0439y7842
Grant:: EP/Y028872/1

Publisher:: Society for Industrial and Applied Mathematics
Journal:: SIAM Journal on Control and Optimization More from this journal
Volume:: 64
Issue:: 1
Pages:: 124-151
Publication date:: 2026-01-09
Acceptance date:: 2025-09-02
DOI:: 10.1137/23m1621071
EISSN:: 1095-7138
ISSN:: 0363-0129

Language:: English
Keywords:: linear-quadratic control

reinforcement learning

policy gradient method

iterative policy optimization

entropy regularization

transfer learning
Pubs id:: 2357487
Local pid:: pubs:2357487
Deposit date:: 2026-01-23
ARK identifier:: ark:/29072/ora_329aed6809654d24ba858609d8f0fd6d

Terms of use

Copyright holder:: Society for Industrial and Applied Mathematics
Copyright date:: 2026
Rights statement:: © 2026 Society for Industrial and Applied Mathematics

Licence:: Terms and Conditions of Use for Oxford University Research Archive

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP