Conference item icon

Conference item

Position: reinforcement learning in dynamic treatment regimes needs critical reexamination

Abstract:
In the rapidly changing healthcare landscape, the implementation of offline reinforcement learning (RL) in dynamic treatment regimes (DTRs) presents a mix of unprecedented opportunities and challenges. This position paper offers a critical examination of the current status of offline RL in the context of DTRs. We argue for a reassessment of applying RL in DTRs, citing concerns such as inconsistent and potentially inconclusive evaluation metrics, the absence of naive and supervised learning baselines, and the diverse choice of RL formulation in existing research. Through a case study with more than 17,000 evaluation experiments using a publicly available Sepsis dataset, we demonstrate that the performance of RL algorithms can significantly vary with changes in evaluation metrics and Markov Decision Process (MDP) formulations. Surprisingly, it is observed that in some instances, RL algorithms can be surpassed by random baselines subjected to policy evaluation methods and reward design. This calls for more careful policy evaluation and algorithm development in future DTR works. Additionally, we discussed potential enhancements toward more reliable development of RL-based dynamic treatment regimes and invited further discussion within the community. Code is available at https://github.com/GilesLuo/ReassessDTR.
Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Publication website:
https://proceedings.mlr.press/v235/luo24f.html

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
ORCID:
0000-0003-0015-2023
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MSD
Department:
Clinical Neurosciences
Role:
Author
ORCID:
0000-0003-1023-3927
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Oxford college:
Kellogg College
Role:
Author
ORCID:
0000-0002-1552-5630


Publisher:
Journal of Machine Learning Research
Pages:
33432-33465
Series:
Proceedings of Machine Learning Research
Series number:
235
Publication date:
2024-07-29
EISSN:
2640-3498


Language:
English
Pubs id:
2031188
Local pid:
pubs:2031188
Deposit date:
2024-09-25

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP