Journal article
ArcTEX—a novel clinical data enrichment pipeline to support real-world evidence oncology studies
- Abstract:
- Data stored within electronic health records (EHRs) offer a valuable source of information for real-world evidence (RWE) studies in oncology. However, many key clinical features are only available within unstructured notes. We present ArcTEX, a novel data enrichment pipeline developed to extract oncological features from NHS unstructured clinical notes with high accuracy, even in resource-constrained environments where availability of GPUs might be limited. By design, the predicted outcomes of ArcTEX are free of patient-identifiable information, making this pipeline ideally suited for use in Trust environments. We compare our pipeline to existing discriminative and generative models, demonstrating its superiority over approaches such as Llama3/3.1/3.2 and other BERT based models, with a mean accuracy of 98.67% for several essential clinical features in endometrial and breast cancer. Additionally, we show that as few as 50 annotated training examples are needed to adapt the model to a different oncology area, such as lung cancer, with a different set of priority clinical features, achieving a comparable mean accuracy of 95% on average.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Version of record, pdf, 1.4MB, Terms of use)
-
- Publisher copy:
- 10.3389/fdgth.2025.1561358
Authors
- Publisher:
- Frontiers Media
- Journal:
- Frontiers in Digital Health More from this journal
- Volume:
- 7
- Article number:
- 1561358
- Publication date:
- 2025-05-09
- Acceptance date:
- 2025-04-23
- DOI:
- EISSN:
-
2673-253X
- Language:
-
English
- Keywords:
- Source identifiers:
-
2950680
- Deposit date:
-
2025-05-23
This ORA record was generated from metadata provided by an external service. It has not been edited by the ORA Team.
If you are the owner of this record, you can report an update to it here: Report update to this record