Journal article
Over-parameterised shallow neural networks with asymmetrical node scaling: global convergence guarantees and feature learning
- Abstract:
- We consider gradient-based optimisation of wide, shallow neural networks, where the output of each hidden node is scaled by a positive parameter. The scaling parameters are non-identical, differing from the classical Neural Tangent Kernel (NTK) parameterisation. We prove that for large such neural networks, with high probability, gradient flow and gradient descent converge to a global minimum and can learn features in some sense, unlike in the NTK parameterisation. We perform experiments illustrating our theoretical results and discuss the benefits of such scaling in terms of prunability and transfer learning.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Version of record, pdf, 6.9MB, Terms of use)
-
- Publication website:
- https://openreview.net/forum?id=Sx1khIIi95
Authors
- Publisher:
- Journal of Machine Learning Research
- Journal:
- Transactions on Machine Learning Research More from this journal
- Volume:
- 2025
- Issue:
- 2
- Publication date:
- 2025-02-18
- Acceptance date:
- 2025-02-10
- EISSN:
-
2835-8856
- Language:
-
English
- Pubs id:
-
2085618
- Local pid:
-
pubs:2085618
- Deposit date:
-
2025-02-13
- ARK identifier:
Terms of use
- Copyright holder:
- Caron et al
- Copyright date:
- 2025
- Rights statement:
- ©2025 The Authors. This paper is an open access article distributed under the terms of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/)
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record