Conference item
Clinical-R1: empowering large language models for faithful and comprehensive reasoning with clinical objective relative policy optimization
- Abstract:
- Recent advances in large language models (LLMs) have shown strong reasoning capabilities through large-scale pretraining and post-training reinforcement learning, demonstrated by DeepSeek-R1. However, current post-training methods, such as Grouped Relative Policy Optimization (GRPO), mainly reward correctness, which is not aligned with the multi-dimensional objectives required in high-stakes fields such as medicine, where reasoning must also be faithful and comprehensive. We introduce Clinical-Objective Relative Policy Optimization (CRPO), a scalable, multi-objective, verifiable reinforcement learning method designed to align LLM post-training with clinical reasoning principles. CRPO integrates rule-based and verifiable reward signals that jointly optimize accuracy, faithfulness, and comprehensiveness without relying on human annotation. To demonstrate its effectiveness, we train Clinical-R1-3B, a 3B-parameter model for clinical reasoning. The experiments on three benchmarks demonstrate that our CRPO substantially improves reasoning on truthfulness and completeness over standard GRPO while maintaining comfortable accuracy enhancements. This framework provides a scalable pathway to align LLM reasoning with clinical objectives, enabling safer and more collaborative AI systems for healthcare while also highlighting the potential of multi-objective, verifiable RL methods in posttraining scaling of LLMs for medical domains.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Version of record, pdf, 2.8MB, Terms of use)
-
- Publication website:
- https://proceedings.mlr.press/v317/gu26a.html
Authors
- Publisher:
- PMLR
- Host title:
- Proceedings of The Second AAAI Bridge Program on AI for Medicine and Healthcare
- Volume:
- 317
- Pages:
- 117-126
- Series:
- Proceedings of Machine Learning Research
- Publication date:
- 2026-03-14
- Acceptance date:
- 2025-12-02
- Event title:
- 2nd AAAI Bridge Program on AI for Medicine and Healthcare (AIMedHealth 2026)
- Event location:
- Singapore
- Event website:
- https://aaai.org/conference/aaai/aaai-26/
- Event start date:
- 2026-01-20
- Event end date:
- 2026-01-21
- ISSN:
-
2640-3498
- Language:
-
English
- Pubs id:
-
2358178
- Local pid:
-
pubs:2358178
- Deposit date:
-
2026-01-13
- ARK identifier:
Terms of use
- Copyright holder:
- Gu et al
- Copyright date:
- 2026
- Rights statement:
- ©️ 2026 by the author(s). This is an open access article under the CC-BY license.
- Notes:
- This paper was presented at the 2nd AAAI Bridge Program on AI for Medicine and Healthcare (AIMedHealth 2026), 20th-21st January 2026, Singapore, held alongside the 40th Annual AAAI Conference on Artificial Intelligence (AAAI 2026), 20th-27th January 2026, Singapore.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record