Journal article icon

Journal article

Over-parameterised shallow neural networks with asymmetrical node scaling: global convergence guarantees and feature learning

Abstract:
We consider gradient-based optimisation of wide, shallow neural networks, where the output of each hidden node is scaled by a positive parameter. The scaling parameters are non-identical, differing from the classical Neural Tangent Kernel (NTK) parameterisation. We prove that for large such neural networks, with high probability, gradient flow and gradient descent converge to a global minimum and can learn features in some sense, unlike in the NTK parameterisation. We perform experiments illustrating our theoretical results and discuss the benefits of such scaling in terms of prunability and transfer learning.
Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Publication website:
https://openreview.net/forum?id=Sx1khIIi95

Authors

More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Statistics
Oxford college:
Keble College
Role:
Author
ORCID:
0000-0002-3952-224X


Publisher:
Journal of Machine Learning Research
Journal:
Transactions on Machine Learning Research More from this journal
Volume:
2025
Issue:
2
Publication date:
2025-02-18
Acceptance date:
2025-02-10
EISSN:
2835-8856


Language:
English
Pubs id:
2085618
Local pid:
pubs:2085618
Deposit date:
2025-02-13
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP