Conference item
ScoEHR: generating synthetic electronic health records using continuous-time diffusion models
- Abstract:
- Global access to statistically and clinically representative patient health data holds potential for advancing disease research, enhancing patient care, and accelerating drug development. However, acquisition of health data such as electronic health records (EHRs) comes with challenges characterised by high costs, time constraints, and concerns related to patient privacy. An approach to tackling these challenges is by using synthetic data. In this paper we introduce ScoEHR, a novel deep learning method for generating synthetic EHRs, which combines an autoencoder with a continuous-time diffusion model. ScoEHR is shown to outperform three baseline synthetic EHR generation frameworks (medGAN, medWGAN, and medBGAN) on two publicly available datasets, MIMIC-III and the Yale New Haven Health System Emergency Department dataset, based on four widely accepted metrics of data utility. Additionally, a blind clinician evaluation was carried out to assess the qualitative realism of the synthetic data generated by ScoEHR. In this evaluation, a patient’s data was labeled as ‘unrealistic’ if at least one clinician found it to be unrealistic. This evaluation showed that existing real EHR data and ScoEHR generated synthetic data were scored as equally realistic. Our code is available at https://github.com/aanaseer/ ScoEHR.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Version of record, pdf, 1.3MB, Terms of use)
-
- Publication website:
- https://proceedings.mlr.press/v219/naseer23a.html
Authors
- Publisher:
- Proceedings of Machine Learning Research
- Host title:
- Proceedings of the 8th Machine Learning for Healthcare Conference
- Volume:
- 219
- Pages:
- 489-508
- Publication date:
- 2023-12-22
- Event title:
- 8th Machine Learning for Healthcare Conference (MLHC 2023)
- Event location:
- New York, New York, USA
- Event website:
- https://www.mlforhc.org/2023-agenda
- Event start date:
- 2023-08-11
- Event end date:
- 2023-08-12
- ISSN:
-
2640-3498
- Language:
-
English
- Pubs id:
-
1710459
- Local pid:
-
pubs:1710459
- Deposit date:
-
2024-04-29
- ARK identifier:
Terms of use
- Copyright holder:
- Naseer et al.
- Copyright date:
- 2023
- Rights statement:
- © 2023 A.A. Naseer et al. Open Access. This article is licensed under the Creative Commons Attribution 4.0 International License.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record