Conference item icon

Conference item

MALT: improving reasoning with multi-agent LLM training

Abstract:

Large Language Models (LLMs) often produce answers with a single chain-of-thought, which restricts their ability to explore reasoning paths or self-correct flawed outputs in complex tasks. In this paper, we introduce MALT (Multi-Agent LLM Training), a novel post-training strategy that divides the reasoning process into generation, verification, and refinement steps using a sequential pipeline of heterogeneous agents. During data generation, each agent is repeatedly sampled to form a multi-agent search tree, where final outputs are graded against ground-truth data. We then apply value iteration to propagate reward signals back to each role-conditioned model, automatically producing multi-agent post-training data without human or teacher-model supervision. Our off-policy approach allows each agent to specialize by learning from correct and incorrect trajectories, ultimately improving the end-to-end reasoning chain. On MATH, GSM8K, and CSQA, MALT surpasses the same baseline LLM with a relative improvement of 15.66%, 7.42%, and 9.40% respectively, making it an important advance towards multi-agent cooperative training.

Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Files:
Publication website:
https://openreview.net/forum?id=jXP9bgFack#discussion

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author


Publisher:
OpenReview
Article number:
466
Publication date:
2025-07-08
Acceptance date:
2025-07-07
Event title:
2nd Conference on Language Modeling (COLM 2025)
Event location:
Montreal, Canada
Event website:
https://colmweb.org/index.html
Event start date:
2025-10-07
Event end date:
2025-10-10


Language:
English
Pubs id:
2287121
Local pid:
pubs:2287121
Deposit date:
2025-09-09

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP