Conference item
"Hey, that's not an ODE": faster ODE adjoints with 12 lines of code
- Abstract:
- Neural differential equations may be trained by backpropagating gradients via the adjoint method, which is another differential equation typically solved using an adaptive-step-size numerical differential equation solver. A proposed step is accepted if its error, \emph{relative to some norm}, is sufficiently small; else it is rejected, the step is shrunk, and the process is repeated. Here, we demonstrate that the particular structure of the adjoint equations makes the usual choices of norm (such as đż2) unnecessarily stringent. By replacing it with a more appropriate (semi)norm, fewer steps are unnecessarily rejected and the backpropagation is made faster. This requires only minor code modifications. Experiments on a wide range of tasksâincluding time series, generative modeling, and physical controlâdemonstrate a median improvement of 40% fewer function evaluations. On some problems we see as much as 62% fewer function evaluations, so that the overall training time is roughly halved.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Version of record, pdf, 775.9KB, Terms of use)
-
- Publication website:
- http://proceedings.mlr.press/v139/kidger21a.html
Authors
- Publisher:
- Journal of Machine Learning Research
- Pages:
- 5443-5452
- Series:
- Proceedings of Machine Learning Research
- Series number:
- 139
- Publication date:
- 2021-07-01
- Acceptance date:
- 2021-05-08
- Event title:
- Thirty-eighth International Conference on Machine Learning (ICML 2021)
- Event location:
- Virtual event
- Event website:
- https://icml.cc/Conferences/2021
- Event start date:
- 2021-07-18
- Event end date:
- 2021-07-24
- ISSN:
-
2640-3498
- Language:
-
English
- Keywords:
- Pubs id:
-
1133477
- Local pid:
-
pubs:1133477
- Deposit date:
-
2021-02-18
Terms of use
- Copyright holder:
- Kidger et al.
- Copyright date:
- 2020
- Rights statement:
- Copyright 2021 by the author(s).
- Notes:
- This paper was presented at the Thirty-eighth International Conference on Machine Learning (ICML 2021), 18-24 July 2021, Virtual event.
If you are the owner of this record, you can report an update to it here: Report update to this record