Wide deep neural networks

Hayou, S

Abstract:: Deep neural networks have had tremendous success in a wide range of applications where they achieve state of the art performance. Their success can be generally attributed to three main pillars: their natural back-propagation structure which allows time and resources efficient gradient computation; recent advances in optimization theory which have led to the development of fast training algorithms; and availability of computationally efficient software (neural networks frameworks such as PyTorch and Tensorflow), hardware (Graphics Processing Units(GPUs), and more recently Tensor Processing Units(TPUs)).

Deep neural networks are now the model of choice for many practitioners. As a result, there is a growing research interest in their theoretical properties. Classical results from approximation theory ensure that neural networks, even with a single hidden layer, are universal approximators, provided the model is big enough. From an optimization point of view, the loss surface of a deep neural network is generally highly non-convex, posing a big limitation on what results from Optimization theory can be applied to such models. Therefore, little is known about the local minimas, the saddle points, and the convergence of gradient based method (e.g. Stochastic Gradient Descent), with these models. Another interesting and understudied research topic is that of randomly initialized neural networks. Indeed, random neural networks provide a compelling framework that offers, in many cases, a simplified short-cut to understand theoretical properties of neural networks at initialization and Bayesian neural networks. Against this backdrop, the research presented in this thesis focuses on the theoretical properties of randomly initialized wide deep neural networks. It provides a comprehensive analysis of these models at initialization, leveraging a duality between random wide neural networks and Gaussian processes. Particularly, the research here presented pays careful attention to the role of the initialization hyperparameters, the activation function, and the neural architecture in the behaviour of these models. This level of depth allows for the derivation of principled guidelines for the training and designing of deep neural networks.

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Hayou, S. (2021). Wide deep neural networks [PhD thesis]. University of Oxford.

MLA Style

Hayou, S. Wide Deep Neural Networks. 2021. University of Oxford, PhD thesis.

Chicago Style

Hayou, S. 2021. “Wide Deep Neural Networks.” PhD thesis, University of Oxford.
Print

Access Document

Files:: Hayou_S_2021_DPhil_Thesis.pdf

(Preview, Dissemination version, pdf, 4.4MB, Terms of use)

Authors

+ Hayou, S More by this author

Division:: MPLS
Department:: Statistics
Role:: Author

Contributors

+ Doucet, A

Role:: Supervisor
ORCID:: 0000-0002-7662-419X

+ Rousseau, J

Role:: Supervisor
ORCID:: 0000-0002-0998-6174

+ Engineering and Physical Sciences Research Council More from this funder

Funder identifier:: http://dx.doi.org/10.13039/501100000266
Grant:: 1929843

DOI:: 10.5287/ora-z6nnnanya
Type of award:: DPhil
Level of award:: Doctoral
Awarding institution:: University of Oxford

Language:: English
Keywords:: signal propagation

gaussian processes

initialization

deep learning

neural networks
Subjects:: Gaussian processes

Deep learning

initialization

Neural networks (Computer science)
Deposit date:: 2021-06-23
ARK identifier:: ark:/29072/ora_dd8748b574f64a3fbc491da67d1b7b24

Terms of use

Copyright holder:: Hayou, S

Licence:: Terms and Conditions of Use for Oxford University Research Archive

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Thesis

Wide deep neural networks

Actions

Access Document

Authors

Contributors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Thesis

Wide deep neural networks

Actions

Access Document

Authors

Contributors

Funding

Bibliographic Details

Item Description

Related Items

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions