Systems and methods for improving large language models with increased inference compute

Brown, B

Thesis

Systems and methods for improving large language models with increased inference compute

Abstract:: Scaling the amount of compute used to train language models has led to massive increases in their capability. A relatively under-explored direction has been scaling the amount of compute used at inference time. This thesis explores the design space for test-time compute algorithms, focusing on the simple method of repeated sampling. Notably, this thesis demonstrates that coverage, the percent of problems solved by any sample, often scales log-linearly with the number of samples over four orders of magnitude. In domains where automatic verification tools are available, this allows Llama-8b to outperform the single-attempt performance of GPT4o across all tasks studied. These increases in coverage can often be modeled with an exponentiated power law, which enables forecasting coverage up to 10k samples (the largest sample size attempted) using an order of magnitude fewer samples with an average error of 3.84% across all models and datasets. In domains where tools for verification are available, these gains in coverage directly translate into increased capability. In other domains, methods for selecting a correct sample from a large sample collection are required. This thesis investigates this selection problem in the salient application of software engineering. Using a combination of unit-test based voting and model-based selection, half of the accuracy between the single-attempt lower bound and coverage upper bound can be recovered. Finally, this thesis explores the systems implications of test-time compute scaling and presents an exact implementation of attention that accelerates inference throughput in these settings by over 10x.

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Cite

Cite this record

APA Style

Brown, B. (2025). Systems and methods for improving large language models with increased inference compute [Master's thesis]. University of Oxford.

MLA Style

Brown, B. Systems and Methods for Improving Large Language Models with Increased Inference Compute. University of Oxford, 2025.

Chicago Style

Brown, B. 2025. “Systems and Methods for Improving Large Language Models with Increased Inference Compute.” Master's thesis, University of Oxford.
Share
Print