Journal article icon

Journal article

Activation function design for deep networks: linearity and effective initialisation

Abstract:
The activation function deployed in a deep neural network has great influence on the performance of the network at initialisation, which in turn has implications for training. In this paper we study how to avoid two problems at initialisation identified in prior works: rapid convergence of pairwise input correlations, and vanishing and exploding gradients. We prove that both these problems can be avoided by choosing an activation function possessing a sufficiently large linear region around the origin, relative to the bias variance σ2/b of the network's random initialisation. We demonstrate empirically that using such activation functions leads to tangible benefits in practice, both in terms of test and training accuracy and in terms of training time. Furthermore, we observe that the shape of the nonlinear activation outside the linear region appears to have a relatively limited impact on training. Finally, our results also allow us to train networks in a new hyperparameter regime, with a much larger bias variance than has previously been possible.
Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Files:
Publisher copy:
10.1016/j.acha.2021.12.010

Authors


More by this author
Role:
Author
ORCID:
0000-0002-1838-8950
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Mathematical Institute
Role:
Author


Publisher:
Elsevier
Journal:
Applied and Computational Harmonic Analysis More from this journal
Volume:
59
Pages:
117-154
Publication date:
2022-01-04
Acceptance date:
2021-12-26
DOI:
ISSN:
1063-5203


Language:
English
Keywords:
Pubs id:
1230287
Local pid:
pubs:1230287
Deposit date:
2022-01-07

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP