Journal article
Fast policy learning for linear-quadratic control with entropy regularization
- Abstract:
-
This paper proposes and analyzes two new policy learning methods, regularized policy gradient and iterative policy optimization (IPO), for a class of discounted linear-quadratic control (LQC) problems over an infinite time horizon with entropy regularization. Assuming access to the exact policy evaluation, both proposed approaches are proved to converge linearly in finding optimal policies of the regularized LQC. Moreover, the IPO method can achieve a superlinear convergence rate once it enters a local region around the optimal policy. Finally, when the optimal policy for a reinforcement learning (RL) problem with a known environment is appropriately transferred as the initial policy to an RL problem with an unknown environment, the IPO method is shown to converge at a superlinear rate if the two environments are sufficiently close. A model-free version of the policy-based methods is also discussed. Performances of these proposed algorithms are supported by numerical examples.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Publisher copy:
- 10.1137/23m1621071
Authors
- Funder identifier:
- https://ror.org/0439y7842
- Grant:
- EP/Y028872/1
- Publisher:
- Society for Industrial and Applied Mathematics
- Journal:
- SIAM Journal on Control and Optimization More from this journal
- Volume:
- 64
- Issue:
- 1
- Pages:
- 124-151
- Publication date:
- 2026-01-09
- Acceptance date:
- 2025-09-02
- DOI:
- EISSN:
-
1095-7138
- ISSN:
-
0363-0129
- Language:
-
English
- Keywords:
- Pubs id:
-
2357487
- Local pid:
-
pubs:2357487
- Deposit date:
-
2026-01-23
- ARK identifier:
Terms of use
- Copyright holder:
- Society for Industrial and Applied Mathematics
- Copyright date:
- 2026
- Rights statement:
- © 2026 Society for Industrial and Applied Mathematics
If you are the owner of this record, you can report an update to it here: Report update to this record