Thesis
Wide deep neural networks
- Abstract:
-
Deep neural networks have had tremendous success in a wide range of applications where they achieve state of the art performance. Their success can be generally attributed to three main pillars: their natural back-propagation structure which allows time and resources efficient gradient computation; recent advances in optimization theory which have led to the development of fast training algorithms; and availability of computationally efficient software (neural networks frameworks such as PyTorch and Tensorflow), hardware (Graphics Processing Units(GPUs), and more recently Tensor Processing Units(TPUs)).
Deep neural networks are now the model of choice for many practitioners. As a result, there is a growing research interest in their theoretical properties. Classical results from approximation theory ensure that neural networks, even with a single hidden layer, are universal approximators, provided the model is big enough. From an optimization point of view, the loss surface of a deep neural network is generally highly non-convex, posing a big limitation on what results from Optimization theory can be applied to such models. Therefore, little is known about the local minimas, the saddle points, and the convergence of gradient based method (e.g. Stochastic Gradient Descent), with these models. Another interesting and understudied research topic is that of randomly initialized neural networks. Indeed, random neural networks provide a compelling framework that offers, in many cases, a simplified short-cut to understand theoretical properties of neural networks at initialization and Bayesian neural networks. Against this backdrop, the research presented in this thesis focuses on the theoretical properties of randomly initialized wide deep neural networks. It provides a comprehensive analysis of these models at initialization, leveraging a duality between random wide neural networks and Gaussian processes. Particularly, the research here presented pays careful attention to the role of the initialization hyperparameters, the activation function, and the neural architecture in the behaviour of these models. This level of depth allows for the derivation of principled guidelines for the training and designing of deep neural networks.
Actions
Authors
Contributors
- Role:
- Supervisor
- ORCID:
- 0000-0002-7662-419X
- Role:
- Supervisor
- ORCID:
- 0000-0002-0998-6174
- Funder identifier:
- http://dx.doi.org/10.13039/501100000266
- Grant:
- 1929843
- Type of award:
- DPhil
- Level of award:
- Doctoral
- Awarding institution:
- University of Oxford
- Language:
-
English
- Keywords:
- Subjects:
- Deposit date:
-
2021-06-23
Terms of use
- Copyright holder:
- Hayou, S
- Copyright date:
- 2021
If you are the owner of this record, you can report an update to it here: Report update to this record