Natural gradient algorithms for training deep neural networks

Puiu, CO

Thesis

Natural gradient algorithms for training deep neural networks

Abstract:: In this thesis, we reduce the computation cost of a particular class of learning algorithms for Deep Neural Networks (DNNs; human-brain inspired artificial intelligence models). Natural Gradient Descent (NGD) algorithms have favorable properties as compared to Stochastic Gradient methods. K-FAC (Martens and Grosse, 2015) is a feasible implementation of NGD for DNNs which retains some desirable properties of NGD. Still, K-FAC has cubic time-complexity in layer size, due to the computation of Kronecker factors (K-factors) inverses. This results in slow run-times for large K-factors. The thesis solves the time-complexity issue in two steps.

First, we propose low-rank inverse representations of K-factors, obtainable in quadratic-time with randomized Numerical Linear Algebra (NLA). Our theory shows that at least for sufficiently large layers, we can also control the K-factors representation error, owing to the exponentially averaged (EA) construction paradigm. Numerical results show that theory is pessimistic: the error can be controlled better in practical settings. By improving the inverse-application subroutine to scale quadratically as well, we propose two quadratic time-complexity K-FAC variants: R-KFAC and SRE-KFAC.

Secondly, by using Brand's algorithm for online NLA to exploit the EA construction paradigm of K-factors, we propose to maintain online low-rank inverse representations of K-factors, which only takes linear time. The approach is limited to sufficiently large layers, which in practice is typically satisfied by fully connected layers. By further improving the inverse-application subroutine to scale linearly as well in this case, we propose a linear time-complexity K-FAC variant: B-KFAC. B-KFAC also has linear memory-complexity in layer size, unlike quadratic for K-FAC. We also propose to algorithmically control the B-KFAC representation error, at the price of having quadratic time costs on some iterations (most iterations still take linear time). This yields three other algorithms.

Our improved-complexity K-FAC variants speed up K-FAC 3 - 4 times for most investigated settings, in both single-GPU and distributed settings.

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Cite

Cite this record

APA Style

Puiu, C. O. (2023). Natural gradient algorithms for training deep neural networks [PhD thesis]. University of Oxford.

MLA Style

Puiu, C. O. Natural Gradient Algorithms for Training Deep Neural Networks. University of Oxford, 2023.

Chicago Style

Puiu, CO. 2023. “Natural Gradient Algorithms for Training Deep Neural Networks.” PhD thesis, University of Oxford.
Share
Print

Access Document

Files:: Puiu_2023_Natural_gradient_algorithms.pdf

(Preview, Dissemination version, pdf, 9.0MB, Terms of use)

Authors

+ Puiu, CO More by this author

Institution:: University of Oxford
Division:: MPLS
Role:: Author

Contributors

+ Cartis, C

Institution:: University of Oxford
Division:: MPLS
Department:: Mathematical Institute
Role:: Supervisor

+ Bai, S

Institution:: Numerical Algorithms Group
Role:: Supervisor

+ Hauser, R

Institution:: University of Oxford
Division:: MPLS
Department:: Mathematical Institute
Role:: Examiner

+ Martens, J

Institution:: Google Deep Mind
Role:: Examiner

+ Engineering and Physical Sciences Research Council More from this funder

Funding agency for:: Puiu, CO
Grant:: EP/L015803/1
Programme:: Funding programme name: CDT in Industrially Focused Mathematical Modelling. Details: This research was supported by the EPSRC Centre For Doctoral Training in Industrially Focused Mathematical Modelling (EP/L015803/1) in collaboration with the Numerical Algorithms Group and St Annes.

+ St Annes More from this funder

Funding agency for:: Puiu, CO

+ Numerical Algorithms Group (NAG) More from this funder

Funding agency for:: Puiu, CO

DOI:: 10.5287/ora-n1vonzrpn
Type of award:: DPhil
Level of award:: Doctoral
Awarding institution:: University of Oxford

Language:: English
Keywords:: K-FAC

DNN training

deep learning

natural gradient

optimization
Subjects:: Practical Implementation of Natural Gradient for DNNs

Optimization for Deep Neural Networks
Deposit date:: 2024-07-21

Terms of use

Copyright holder:: Puiu, CO

Licence:: CC Attribution (CC BY) 3.0

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Thesis

Natural gradient algorithms for training deep neural networks

Actions

Access Document

Authors

Contributors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Thesis

Natural gradient algorithms for training deep neural networks

Actions

Access Document

Authors

Contributors

Funding

Bibliographic Details

Item Description

Related Items

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions