Clinical-R1: empowering large language models for faithful and comprehensive reasoning with clinical objective relative policy optimization

Gu, B; Zhou, H; Segal, BM; Wu, J; Cao, Z; Zhong, H; Clifton, L; Liu, F; Clifton, D

AI Collection

Conference item

Clinical-R1: empowering large language models for faithful and comprehensive reasoning with clinical objective relative policy optimization

Abstract:: Recent advances in large language models (LLMs) have shown strong reasoning capabilities through large-scale pretraining and post-training reinforcement learning, demonstrated by DeepSeek-R1. However, current post-training methods, such as Grouped Relative Policy Optimization (GRPO), mainly reward correctness, which is not aligned with the multi-dimensional objectives required in high-stakes fields such as medicine, where reasoning must also be faithful and comprehensive. We introduce Clinical-Objective Relative Policy Optimization (CRPO), a scalable, multi-objective, verifiable reinforcement learning method designed to align LLM post-training with clinical reasoning principles. CRPO integrates rule-based and verifiable reward signals that jointly optimize accuracy, faithfulness, and comprehensiveness without relying on human annotation. To demonstrate its effectiveness, we train Clinical-R1-3B, a 3B-parameter model for clinical reasoning. The experiments on three benchmarks demonstrate that our CRPO substantially improves reasoning on truthfulness and completeness over standard GRPO while maintaining comfortable accuracy enhancements. This framework provides a scalable pathway to align LLM reasoning with clinical objectives, enabling safer and more collaborative AI systems for healthcare while also highlighting the potential of multi-objective, verifiable RL methods in posttraining scaling of LLMs for medical domains.

Publication status:: Published

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Gu, B., Zhou, H., Segal, B. M., Wu, J., Cao, Z., Zhong, H., Clifton, L., Liu, F., & Clifton, D. (2026). Clinical-R1: empowering large language models for faithful and comprehensive reasoning with clinical objective relative policy optimization. 2nd AAAI Bridge Program on AI for Medicine and Healthcare (AIMedHealth 2026), 317, 117–126.

MLA Style

Gu, B, et al. “Clinical-R1: Empowering Large Language Models for Faithful and Comprehensive Reasoning with Clinical Objective Relative Policy Optimization.” 2nd AAAI Bridge Program on AI for Medicine and Healthcare (AIMedHealth 2026), Proceedings of Machine Learning Research, vol. 317, 2026, pp. 117–26.

Chicago Style

Gu, B, H Zhou, BM Segal, J Wu, Z Cao, H Zhong, L Clifton, F Liu, and D Clifton. 2026. “Clinical-R1: Empowering Large Language Models for Faithful and Comprehensive Reasoning with Clinical Objective Relative Policy Optimization.” In 2nd AAAI Bridge Program on AI for Medicine and Healthcare (AIMedHealth 2026), 317:117–26. Proceedings of Machine Learning Research. PMLR .
Print

Access Document

Files:: Gu_et_al_2026_Clinical-R1_empowering_large.pdf

(Preview, Version of record, pdf, 2.8MB, Terms of use)

Publication website:: https://proceedings.mlr.press/v317/gu26a.html

Authors

+ Gu, B More by this author

Role:: Author

+ Zhou, H More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Engineering Science
Role:: Author

+ Segal, BM More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Engineering Science
Role:: Author

+ Wu, J More by this author

Role:: Author

+ Cao, Z More by this author

Role:: Author

More authors...

Publisher:: PMLR
Host title:: Proceedings of The Second AAAI Bridge Program on AI for Medicine and Healthcare
Volume:: 317
Pages:: 117-126
Series:: Proceedings of Machine Learning Research
Publication date:: 2026-03-14
Acceptance date:: 2025-12-02
Event title:: 2nd AAAI Bridge Program on AI for Medicine and Healthcare (AIMedHealth 2026)
Event location:: Singapore
Event website:: https://aaai.org/conference/aaai/aaai-26/
Event start date:: 2026-01-20
Event end date:: 2026-01-21
ISSN:: 2640-3498

Language:: English
Pubs id:: 2358178
Local pid:: pubs:2358178
Deposit date:: 2026-01-13
ARK identifier:: ark:/29072/ora_4def5779cdac4c8ea705d72490ab5e85

Terms of use

Copyright holder:: Gu et al
Notes:: This paper was presented at the 2nd AAAI Bridge Program on AI for Medicine and Healthcare (AIMedHealth 2026), 20th-21st January 2026, Singapore, held alongside the 40th Annual AAAI Conference on Artificial Intelligence (AAAI 2026), 20th-27th January 2026, Singapore.

Licence:: CC Attribution (CC BY)

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Conference item

Clinical-R1: empowering large language models for faithful and comprehensive reasoning with clinical objective relative policy optimization

Actions

Access Document

Authors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Conference item

Clinical-R1: empowering large language models for faithful and comprehensive reasoning with clinical objective relative policy optimization

Actions

Access Document

Authors

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions