Conference item : Poster
An exactly solvable model for emergence and scaling laws in the multitask sparse parity problem
- Abstract:
- Deep learning models can exhibit what appears to be a sudden ability to solve a new problem as training time, training data, or model size increases, a phenomenon known as emergence. In this paper, we present a framework where each new ability (a skill) is represented as a basis function. We solve a simple multi-linear model in this skill-basis, finding analytic expressions for the emergence of new skills, as well as for scaling laws of the loss with training time, data size, model size, and optimal compute. We compare our detailed calculations to direct simulations of a two-layer neural network trained on multitask sparse parity, where the tasks in the dataset are distributed according to a power-law. Our simple model captures, using a single fit parameter, the sigmoidal emergence of multiple new skills as training time, data size or model size increases in the neural network.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Authors
- Publisher:
- Curran Associates
- Host title:
- Advances in Neural Information Processing Systems 37 (NeurIPS 2024)
- Volume:
- 37
- Publication date:
- 2024-09-25
- Acceptance date:
- 2024-11-06
- Event title:
- 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024)
- Event location:
- Vancouver, BC, Canada
- Event website:
- http://neurips.cc/Conferences/2024
- Event start date:
- 2024-12-10
- Event end date:
- 2024-12-15
- ISSN:
-
1049-5258
- Language:
-
English
- Subtype:
-
Poster
- Pubs id:
-
2101667
- Local pid:
-
pubs:2101667
- Deposit date:
-
2025-06-09
Terms of use
- Copyright holder:
- Nam et al and NeurIPS
- Copyright date:
- 2024
- Rights statement:
- © (2024) by individual authors and Neural Information Processing Systems Foundation Inc. All rights reserved.
- Notes:
-
This paper was presented at the 38th Conference on Neural Information Processing Systems (NeurIPS 2024), 10th-15th December 2024, Vancouver, BC, Canada.
This is the accepted manuscript version of the article. The final version is available online from Curran Associates at: https://proceedings.neurips.cc/paper_files/paper/2024/hash/45f7ad60c01f17711ccd8ac2f2fb77e3-Abstract-Conference.html
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record