Thesis icon

Thesis

Analysis of online learning algorithms in machine learning

Abstract:
In this thesis, we consider the problem that optimizes the parameter in the stationary distribution of markov decision process, stochastic differential equations (SDEs) and stochastic partial differential equations (SPDEs). First, we study the online Actor-critic algorithms in Reinforcement Learning with tabular parametrization and prove that, under a time rescaling, the algorithm converges to ordinary differential equations (ODEs) as the number of updates becomes large. The convergence and convergence rate to the optimal strategies are given by using a two time-scale analysis which asymptotically decouples the critic ODE from the actor ODE. Next, under the same framework, we show that when both the actor and critic are parameterized by single-layer neural networks, the Actor-critic algorithm will converge in distribution to a system of ODEs with random initial conditions as the number of hidden units and the number of training steps goes to infinity. The convergence to a stationary point of the limit actor network is also established. Further, we develop a new continuous-time stochastic gradient descent method for optimizing over the stationary distribution of SDE models. The novel idea of our algorithm is that the gradient estimate is simultaneously updated using forward propagation of the SDE state derivatives, which asymptotically converges to the direction of steepest descent. We rigorously prove convergence of the online forward propagation algorithm for linear SDE models and present its numerical results to a range of mathematical finance applications. Finally, we establish the convergence of our algorithm for a class of nonlinear dissipative SDEs whose drift and volatility functions both depend upon the parameters which are being optimized. We also show the application of our algorithm in Neural SPDEs.

Actions


Access Document


Files:

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Mathematical Institute
Role:
Author

Contributors

Institution:
University of Oxford
Division:
MPLS
Department:
Mathematical Institute
Role:
Supervisor
Institution:
University of Oxford
Division:
MPLS
Department:
Mathematical Institute
Role:
Supervisor
ORCID:
0000-0002-7426-4645


More from this funder
Funding agency for:
Wang, Z
Grant:
ECM#111665
Programme:
Mathematics of Random Systems CDT Scholarship


DOI:
Type of award:
DPhil
Level of award:
Doctoral
Awarding institution:
University of Oxford

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP