How well does reinforcement learning scale?

Preprint

Abstract:: I demonstrate (using benchmark data from OpenAI's reasoning models) that LLM performance scales much less well with more RL training compute than with more inference compute. To achieve the same gain as one gets from scaling up inference compute by 100x, RL training compute typically needs to be scaled up by 10,000x. Such RL scale-ups have been possible due to starting from a very low base (giving the impressive gains seen over the last year) but will soon be impractical. This means that most future performance gains from the RL scaling paradigm will require continued scaling of the number of reasoning tokens (and thus the deployment costs) by orders of magnitude.

Files:: Ord_2025_How_well_does.pdf

(Preview, Pre-print, pdf, 1.3MB, Terms of use)

Publication website:: https://www.tobyord.com/writing/how-well-does-rl-scale

Licence:: Terms and Conditions of Use for Oxford University Research Archive

If you are the owner of this record, you can report an update to it here: Report update to this record