Journal article
Is SGD a Bayesian sampler? Well, almost
- Abstract:
- Deep neural networks (DNNs) generalise remarkably well in the overparameterised regime, suggesting a strong inductive bias towards functions with low generalisation error. We empirically investigate this bias by calculating, for a range of architectures and datasets, the probability PSGD(f∣S) that an overparameterised DNN, trained with stochastic gradient descent (SGD) or one of its variants, converges on a function f consistent with a training set S. We also use Gaussian processes to estimate the Bayesian posterior probability PB(f∣S) that the DNN expresses f upon random sampling of its parameters, conditioned on S. Our main findings are that PSGD(f∣S) correlates remarkably well with PB(f∣S) and that PB(f∣S) is strongly biased towards low-error and low complexity functions. These results imply that strong inductive bias in the parameter-function map (which determines PB(f∣S)), rather than a special property of SGD, is the primary explanation for why DNNs generalise so well in the overparameterised regime. While our results suggest that the Bayesian posterior PB(f∣S) is the first order determinant of PSGD(f∣S), there remain second order differences that are sensitive to hyperparameter tuning. A function probability picture, based on PSGD(f∣S) and/or PB(f∣S), can shed light on the way that variations in architecture or hyperparameter settings such as batch size, learning rate, and optimiser choice, affect DNN performance.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Version of record, 3.3MB, Terms of use)
-
- Publication website:
- https://jmlr.org/papers/v22/20-676.html
Authors
- Publisher:
- Journal of Machine Learning Research
- Journal:
- Journal of Machine Learning Research More from this journal
- Volume:
- 22
- Pages:
- 1-64
- Article number:
- 79
- Publication date:
- 2021-02-15
- Acceptance date:
- 2020-10-15
- EISSN:
-
1533-7928
- ISSN:
-
1532-4435
- Language:
-
English
- Keywords:
- Pubs id:
-
1179937
- Local pid:
-
pubs:1179937
- Deposit date:
-
2021-11-24
Terms of use
- Copyright holder:
- Mingard et al.
- Copyright date:
- 2021
- Rights statement:
- © 2021 Chris Mingard, Guillermo Valle Peréz, Joar Skalse and Ard A. Louis. : CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v22/20-676.html.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record