Conference item icon

Conference item

Clinical-R1: empowering large language models for faithful and comprehensive reasoning with clinical objective relative policy optimization

Abstract:
Recent advances in large language models (LLMs) have shown strong reasoning capabilities through large-scale pretraining and post-training reinforcement learning, demonstrated by DeepSeek-R1. However, current post-training methods, such as Grouped Relative Policy Optimization (GRPO), mainly reward correctness, which is not aligned with the multi-dimensional objectives required in high-stakes fields such as medicine, where reasoning must also be faithful and comprehensive. We introduce Clinical-Objective Relative Policy Optimization (CRPO), a scalable, multi-objective, verifiable reinforcement learning method designed to align LLM post-training with clinical reasoning principles. CRPO integrates rule-based and verifiable reward signals that jointly optimize accuracy, faithfulness, and comprehensiveness without relying on human annotation. To demonstrate its effectiveness, we train Clinical-R1-3B, a 3B-parameter model for clinical reasoning. The experiments on three benchmarks demonstrate that our CRPO substantially improves reasoning on truthfulness and completeness over standard GRPO while maintaining comfortable accuracy enhancements. This framework provides a scalable pathway to align LLM reasoning with clinical objectives, enabling safer and more collaborative AI systems for healthcare while also highlighting the potential of multi-objective, verifiable RL methods in posttraining scaling of LLMs for medical domains.
Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Files:
Publication website:
https://proceedings.mlr.press/v317/gu26a.html

Authors

More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author


Publisher:
PMLR
Host title:
Proceedings of The Second AAAI Bridge Program on AI for Medicine and Healthcare
Volume:
317
Pages:
117-126
Series:
Proceedings of Machine Learning Research
Publication date:
2026-03-14
Acceptance date:
2025-12-02
Event title:
2nd AAAI Bridge Program on AI for Medicine and Healthcare (AIMedHealth 2026)
Event location:
Singapore
Event website:
https://aaai.org/conference/aaai/aaai-26/
Event start date:
2026-01-20
Event end date:
2026-01-21
ISSN:
2640-3498


Language:
English
Pubs id:
2358178
Local pid:
pubs:2358178
Deposit date:
2026-01-13
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP