Thesis
Dynamic treatment regime for electronic health record
- Abstract:
-
Dynamic Treatment Regimes (DTRs) aim to personalise treatment through sequential decision policies that adapt to a patient’s evolving state. Reinforcement learning (RL) offers a data-driven approach for learning such policies, but key challenges remain unresolved, including how to reliably evaluate RL methods in clinical settings, how to incorporate meaningful clinical knowledge, and how to ensure generalisability across diverse patient populations.
This thesis addresses these challenges through a structured four-part investigation. First, we introduce DTR-Bench, a modular benchmarking platform that simulates four clinical tasks with noise, pharmacological variation, and missingness. We show that commonly used RL algorithms perform inconsistently under clinical variation.
Second, transitioning from in-silico to real clinical scenarios, we revisit the case of RL for sepsis treatment using retrospective ICU data. Our analysis standardises reward definitions and applies a spectrum of off-policy evaluation across patient subgroups. We advocate for stratification techniques to identify populations where RL may be most beneficial, and we emphasise benchmarking against supervised learning and heuristic baselines.
Third, we focus on intravenous insulin titration and prediction of glycaemia in general ICU patients, including those without diagnosed diabetes. Using a curated MIMIC-III dataset, we develop an ensemble evaluation framework for more robust off-policy evaluation and implement RL algorithms that demonstrate performance comparable to that of clinicians.
Finally, we explore large language models (LLMs) as a minimalist alternative to RL-based treatment recommendations. Within a simulated diabetes-control environment, we find that while RL-based incorporation of clinical priors typically requires extensive engineering, LLMs can effectively absorb structured clinical knowledge through prompting alone, sometimes matching or exceeding RL performance. Nevertheless, we identify clear failure modes in LLM reasoning, such as unit mismatches and unsafe dosing decisions, highlighting important limitations to clinical deployment.
Collectively, these investigations establish reproducible benchmarks, expose failure modes, and provide guidance for safer, more generalisable DTR algorithms. This work lays some essential methodological foundations for the future growth and evaluation of intelligent dynamic treatment algorithms.
Actions
Access Document
- Files:
-
-
(Preview, Dissemination version, pdf, 4.6MB, Terms of use)
-
Authors
Contributors
+ Zhu, T
- Institution:
- University of Oxford
- Division:
- MPLS
- Department:
- Engineering Science
- Role:
- Supervisor
- ORCID:
- 0000-0002-1552-5630
- DOI:
- Type of award:
- DPhil
- Level of award:
- Doctoral
- Awarding institution:
- University of Oxford
- Language:
-
English
- Keywords:
- Subjects:
- Deposit date:
-
2025-12-03
- ARK identifier:
Terms of use
- Copyright holder:
- Zhiyao Luo
- Copyright date:
- 2025
If you are the owner of this record, you can report an update to it here: Report update to this record