Preprint
How well does reinforcement learning scale?
- Abstract:
- I demonstrate (using benchmark data from OpenAI's reasoning models) that LLM performance scales much less well with more RL training compute than with more inference compute. To achieve the same gain as one gets from scaling up inference compute by 100x, RL training compute typically needs to be scaled up by 10,000x. Such RL scale-ups have been possible due to starting from a very low base (giving the impressive gains seen over the last year) but will soon be impractical. This means that most future performance gains from the RL scaling paradigm will require continued scaling of the number of reasoning tokens (and thus the deployment costs) by orders of magnitude.
- Publication status:
- Published
- Peer review status:
- Not peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Pre-print, pdf, 1.3MB, Terms of use)
-
- Publication website:
- https://www.tobyord.com/writing/how-well-does-rl-scale
Authors
- Publication date:
- 2025-10-20
- Language:
-
English
- Pubs id:
-
2357827
- Local pid:
-
pubs:2357827
- Deposit date:
-
2026-01-12
- ARK identifier:
Terms of use
- Copyright holder:
- Toby Ord
- Copyright date:
- 2025
If you are the owner of this record, you can report an update to it here: Report update to this record