Journal article icon

Journal article

Fast policy learning for linear-quadratic control with entropy regularization

Abstract:

This paper proposes and analyzes two new policy learning methods, regularized policy gradient and iterative policy optimization (IPO), for a class of discounted linear-quadratic control (LQC) problems over an infinite time horizon with entropy regularization. Assuming access to the exact policy evaluation, both proposed approaches are proved to converge linearly in finding optimal policies of the regularized LQC. Moreover, the IPO method can achieve a superlinear convergence rate once it enters a local region around the optimal policy. Finally, when the optimal policy for a reinforcement learning (RL) problem with a known environment is appropriately transferred as the initial policy to an RL problem with an unknown environment, the IPO method is shown to converge at a superlinear rate if the two environments are sufficiently close. A model-free version of the policy-based methods is also discussed. Performances of these proposed algorithms are supported by numerical examples.

Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Publisher copy:
10.1137/23m1621071

Authors

More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Mathematical Institute
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Mathematical Institute
Oxford college:
The Queen's College
Role:
Author
ORCID:
0000-0003-4293-3450


More from this funder
Funder identifier:
https://ror.org/0439y7842
Grant:
EP/Y028872/1


Publisher:
Society for Industrial and Applied Mathematics
Journal:
SIAM Journal on Control and Optimization More from this journal
Volume:
64
Issue:
1
Pages:
124-151
Publication date:
2026-01-09
Acceptance date:
2025-09-02
DOI:
EISSN:
1095-7138
ISSN:
0363-0129


Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP