Journal article
Convergence of policy gradient methods for finite-horizon stochastic linear-quadratic control problems
- Abstract:
- We study the global linear convergence of policy gradient (PG) methods for finite-horizon continuous-time exploratory linear-quadratic control (LQC) problems. The setting includes stochastic LQC problems with indefinite costs and allows additional entropy regularisers in the objective. We consider a continuous-time Gaussian policy whose mean is linear in the state variable and whose covariance is stateindependent. Contrary to discrete-time problems, the cost is noncoercive in the policy and not all descent directions lead to bounded iterates. We propose geometry-aware gradient descents for the mean and covariance of the policy using the Fisher geometry and the Bures-Wasserstein geometry, respectively. The policy iterates are shown to satisfy an a-priori bound, and converge globally to the optimal policy with a linear rate. We further propose a novel PG method with discrete-time policies. The algorithm leverages the continuous-time analysis, and achieves a robust linear convergence across different action frequencies. A numerical experiment confirms the convergence and robustness of the proposed algorithm.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 554.6KB, Terms of use)
-
- Publisher copy:
- 10.1137/22M1533517
Authors
- Publisher:
- Society for Industrial and Applied Mathematics
- Journal:
- SIAM Journal on Control and Optimization More from this journal
- Volume:
- 62
- Issue:
- 2
- Pages:
- 1060 - 1092
- Publication date:
- 2024-03-22
- Acceptance date:
- 2024-01-05
- DOI:
- EISSN:
-
1095-7138
- ISSN:
-
0363-0129
- Language:
-
English
- Keywords:
- Pubs id:
-
1595388
- Local pid:
-
pubs:1595388
- Deposit date:
-
2024-01-06
Terms of use
- Copyright holder:
- Society for Industrial and Applied Mathematics
- Copyright date:
- 2024
- Rights statement:
- © 2024 Society for Industrial and Applied Mathematics.
- Notes:
- This is the accepted manuscript version of the article. The final version is available online from Society for Industrial and Applied Mathematics at: https://dx.doi.org/10.1137/22M1533517
If you are the owner of this record, you can report an update to it here: Report update to this record