Thesis
Toward efficient deep learning with sparse neural networks
- Abstract:
-
Despite the tremendous success that deep learning has achieved in recent years, it remains challenging to deal with the excessive computational and memory cost involved in executing deep learning based applications. To address the challenge, this thesis focuses on studying sparse neural networks, particularly around their construction, initialization, and large-scale training aspects, as an attempt to take a step toward efficient deep learning.
Firstly, this thesis addresses the problem of finding sparse neural networks by pruning. Network pruning is an effective methodology to sparsify neural networks, and yet, existing approaches often introduce hyperparameters that either need to be tuned with expert knowledge or are based on ad-hoc intuitions, and typically entails iterative training steps. Alternatively, this thesis begins with proposing an efficient pruning method that is applied to a neural network prior to training in a single shot. The obtained sparse neural network using this method, once trained, exhibit state-of-the-art performance on various image classification tasks.
Albeit efficient, it remains unclear exactly why this approach of pruning at initialization can be effective. This thesis then extends this method by developing a new perspective, from which the problem of finding trainable sparse neural networks is approached based on network initialization. Being a key to the success of finding and training sparse neural networks, this thesis proposes a sufficient initialization condition that can be easily satisfied with a simple optimization step and, once achieved, accelerates training sparse neural networks quite significantly.
While sparse neural networks can be obtained by pruning at initialization, there has been little study concerning the subsequent training of these sparse networks. This thesis lastly concentrates on studying data parallelism -- a straightforward approach to speed up neural network training by parallelizing it using a distributed computing system -- under the influence of sparsity. To this end, the effects of data parallelism and sparsity are first measured accurately based on extensive experiments which are accompanied by metaparameter search. Then, this thesis establishes theoretical results that precisely account for these effects, which have only been addressed partially and empirically and thus remained as debatable.
Actions
- DOI:
- Type of award:
- DPhil
- Level of award:
- Doctoral
- Awarding institution:
- University of Oxford
- Language:
-
English
- Keywords:
- Subjects:
- Pubs id:
-
2044869
- Local pid:
-
pubs:2044869
- Deposit date:
-
2020-11-21
- Title:
- Toward efficient deep learning with sparse neural networks
- DOI:
- 10.5287/ora-kq1p0n8vd-2 Request object version
- Created date:
- 2024-12-02
- Title:
- Toward efficient deep learning with sparse neural networks
- DOI:
- 10.5287/ora-kq1p0n8vd-1 Request object version
- Created date:
- 2024-12-02
Terms of use
- Copyright holder:
- Lee, N
- Copyright date:
- 2020
If you are the owner of this record, you can report an update to it here: Report update to this record