Asymptotic analysis of deep learning algorithms

Thesis

Abstract:: We investigate the asymptotic properties of deep residual networks as the number of layers increases. We first show the existence of scaling regimes for trained weights markedly different from those implicitly assumed in the neural ODE literature. We study the convergence of the hidden state dynamics in these scaling regimes, showing that one may obtain an ODE, a stochastic differential equation (SDE) or neither. Furthermore, we derive the corresponding scaling limits for the backpropagation dynamics. Finally, we prove that in the case of a smooth activation function, the scaling regime arises as a consequence of using gradient descent. In particular, we prove linear convergence of gradient descent to a global minimum for the training of deep residual networks. We also show that if the trained weights, as a function of the layer index, admit a scaling limit as the depth increases, then the limit has finite 2-variation.

This work also investigate the mean-field limit of path-homogeneous neural architectures. We prove convergence of the Wasserstein gradient flow to a global minimum, and we derive a generalization bound based on the stability of the optimization algorithm for 2-layer neural networks with ReLU activation.

Files:: Rossier_2023_Asymptotic_analysis_of.pdf

(Preview, Dissemination version, pdf, 2.4MB, Terms of use)

Language:: English
Keywords:: neural networks

residual networks

deep learning

stochastic analysis

convergence bounds

differential equations

gradient descent
Subjects:: Neural networks (Computer science)

Stochastic differential equations
Deposit date:: 2024-06-20

Licence:: Terms and Conditions of Use for Oxford University Research Archive

If you are the owner of this record, you can report an update to it here: Report update to this record