Thesis
Asymptotic analysis of deep learning algorithms
- Abstract:
-
We investigate the asymptotic properties of deep residual networks as the number of layers increases. We first show the existence of scaling regimes for trained weights markedly different from those implicitly assumed in the neural ODE literature. We study the convergence of the hidden state dynamics in these scaling regimes, showing that one may obtain an ODE, a stochastic differential equation (SDE) or neither. Furthermore, we derive the corresponding scaling limits for the backpropagation dynamics. Finally, we prove that in the case of a smooth activation function, the scaling regime arises as a consequence of using gradient descent. In particular, we prove linear convergence of gradient descent to a global minimum for the training of deep residual networks. We also show that if the trained weights, as a function of the layer index, admit a scaling limit as the depth increases, then the limit has finite 2-variation.
This work also investigate the mean-field limit of path-homogeneous neural architectures. We prove convergence of the Wasserstein gradient flow to a global minimum, and we derive a generalization bound based on the stability of the optimization algorithm for 2-layer neural networks with ReLU activation.
Actions
- DOI:
- Type of award:
- DPhil
- Level of award:
- Doctoral
- Awarding institution:
- University of Oxford
- Language:
-
English
- Keywords:
- Subjects:
- Deposit date:
-
2024-06-20
Terms of use
- Copyright holder:
- Alain Rossier
- Copyright date:
- 2023
If you are the owner of this record, you can report an update to it here: Report update to this record