Thesis icon

Thesis

Optimisation for efficient deep learning

Abstract:

Over the past 10 years there has been a huge advance in the performance power of deep neural networks on many supervised learning tasks. Over this period these models have redefined the state of the art numerous times on many classic machine vision and natural language processing benchmarks. Deep neural networks have also found their way into many real-world applications including chat bots, art generation, voice activated virtual assistants, surveillance, and medical diagnosis systems. Much of the improved performance of these models can be attributed to an increase in scale, which in turn has raised computation and energy costs.

In this thesis we detail approaches of how to reduce the cost of deploying deep neural networks in various settings. We first focus on training efficiency, and to that end we present two optimisation techniques that produce high accuracy models without extensive tuning. These optimisers only have a single fixed maximal step size hyperparameter to cross-validate and we demonstrate that they outperform other comparable methods in a wide range of settings. These approaches do not require the onerous process of finding a good learning rate schedule, which often requires training many versions of the same network, hence they reduce the computation needed. The first of these optimisers is a novel bundle method designed for the interpolation setting. The second demonstrates the effectiveness of a Polyak-like step size in combination with an online estimate of the optimal loss value in the non-interpolating setting.

Next, we turn our attention to training efficient binary networks with both binary parameters and activations. With the right implementation, fully binary networks are highly efficient at inference time, as they can replace the majority of operations with cheaper bit-wise alternatives. This makes them well suited for lightweight or embedded applications. Due to the discrete nature of these models conventional training approaches are not viable. We present a simple and effective alternative to the existing optimisation techniques for these models.

Actions


Access Document


Files:

Authors


More by this author
Division:
MPLS
Department:
Engineering Science
Role:
Author

Contributors

Role:
Contributor
Role:
Supervisor
Role:
Supervisor
Role:
Supervisor
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Supervisor
ORCID:
0000-0002-8945-8573


More from this funder
Funder identifier:
http://dx.doi.org/10.13039/501100000726
Grant:
EP/L015897/1
Programme:
Russell Studentship Agreement


Type of award:
DPhil
Level of award:
Doctoral
Awarding institution:
University of Oxford


Language:
English
Keywords:
Subjects:
Deposit date:
2023-07-09

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP