Conference item icon

Conference item

Beyond the mean: Fisher-orthogonal projection for natural gradient descent in large batch training

Abstract:

Modern GPUs are equipped with large amounts of high-bandwidth memory, enabling them to support mini-batch sizes of up to tens of thousands of training samples. However, most existing optimizers struggle to perform effectively at such a large batch size. As batch size increases, gradient noise decreases due to averaging over many samples, limiting the ability of first-order methods to escape sharp or suboptimal minima and reach the global minimum. Meanwhile, second-order methods like the natural gradient with Kronecker-Factored Approximate Curvature (KFAC) often require excessively high damping to remain stable at large batch sizes. This high damping effectively ``washes out" the curvature information that gives these methods their advantage, reducing their performance to that of simple gradient descent. In this paper, we introduce Fisher-Orthogonal Projection (FOP), a novel technique that restores the effectiveness of the second-order method at very large batch sizes, enabling scalable training with improved generalization and faster convergence. FOP constructs a variance-aware update direction by leveraging gradients from two sub-batches, enhancing the average gradient with a component of the gradient difference that is orthogonal to the average under the Fisher-metric. Through extensive benchmarks, we show that FOP accelerates convergence by ×1.2–1.3 over K-FAC and ×1.5–1.7 over SGD/AdamW at the same moderate batch sizes, while at extreme scales it achieves up to a ×7.5 speedup. Unlike other methods, FOP maintains small-batch accuracy when scaling to extremely large batch sizes. Moreover, it reduces Top-1 error by 2.3–3.3% on long-tailed CIFAR benchmarks, demonstrating robust generalization under severe class imbalance. Our lightweight, geometry-aware use of intra-batch variance makes natural-gradient optimization practical on modern data-centre GPUs. FOP is open-source and pip-installable, which can be integrated into existing training code with a single line and no extra configuration.

Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Files:
Publisher copy:
10.1609/aaai.v40i29.39590

Authors

More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
ORCID:
0000-0003-1756-3064


More from this funder
Funder identifier:
https://ror.org/028z36n30
Grant:
EP/T022205/1


Publisher:
Association for the Advancement of Artificial Intelligence
Host title:
Proceedings of the 40th Annual AAAI Conference on Artificial Intelligence
Volume:
40
Issue:
29
Pages:
24115-24123
Publication date:
2026-03-14
Event title:
40th AAAI Conference on Artificial Intelligence (AAAI 2026)
Event location:
Singapore
Event website:
https://aaai.org/conference/aaai/aaai-26/
Event start date:
2026-01-20
Event end date:
2026-01-27
DOI:
EISSN:
2374-3468
ISSN:
2159-5399
ISBN-10:
1577359062
ISBN-13:
9781577359067


Language:
English
Keywords:
Pubs id:
2405333
Local pid:
pubs:2405333
Source identifiers:
W7138078850
Deposit date:
2026-04-29
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP