Conference item icon

Conference item

Towards agents that know when they don't know: uncertainty as a control signal for structured reasoning

Abstract:

Synthetic chain-of-thought (CoT) traces are widely used to train large reasoning models (LRMs), improving generalization by providing step-level supervision. Yet most approaches require ground-truth labels to seed or filter these traces—an expensive bottleneck in domains like biology where wet-lab data are scarce. We propose a label-free alternative: uncertainty-based filtering, which uses a model’s own confidence—quantified through established uncertainty metrics like self-consistency and predictive perplexity—as a substitute for external labels. We sample multiple reasoning traces and retain only low-uncertainty subsets. Applied to biological perturbation prediction, a domain where wet-lab labels are especially costly, we show that the filtered subset has higher accuracy, and that supervised fine-tuning (SFT) on uncertainty-filtered data outperforms unfiltered synthetic data, narrows the gap to ground-truth training, and surpasses strong LRM baselines. Ablations show that per-class filtering corrects for class-specific uncertainty scales and that hybrid uncertainty metrics yield higher-quality datasets. Our results suggest that modelinternal confidence is a powerful signal for efficient reasoning dataset creation, enabling LRMs in domains where supervision is expensive.

Publication status:
Accepted
Peer review status:
Peer reviewed

Actions

Access Document

Files:
Publication website:
https://openreview.net/forum?id=jnioLSDyeX

Authors

More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
ORCID:
0009-0006-0259-5732


Publisher:
Open Review
Publication date:
2025-09-16
Acceptance date:
2025-10-16
Event title:
39th Conference on Neural Information Processing Systems (NeurIPS 2025)
Event location:
San Diego, California, USA
Event website:
https://neurips.cc/Conferences/2025
Event start date:
2025-11-30
Event end date:
2025-12-07


Language:
English
Pubs id:
2335629
Local pid:
pubs:2335629
Deposit date:
2025-11-26
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP