Conference item icon

Conference item

ScoEHR: generating synthetic electronic health records using continuous-time diffusion models

Abstract:
Global access to statistically and clinically representative patient health data holds potential for advancing disease research, enhancing patient care, and accelerating drug development. However, acquisition of health data such as electronic health records (EHRs) comes with challenges characterised by high costs, time constraints, and concerns related to patient privacy. An approach to tackling these challenges is by using synthetic data. In this paper we introduce ScoEHR, a novel deep learning method for generating synthetic EHRs, which combines an autoencoder with a continuous-time diffusion model. ScoEHR is shown to outperform three baseline synthetic EHR generation frameworks (medGAN, medWGAN, and medBGAN) on two publicly available datasets, MIMIC-III and the Yale New Haven Health System Emergency Department dataset, based on four widely accepted metrics of data utility. Additionally, a blind clinician evaluation was carried out to assess the qualitative realism of the synthetic data generated by ScoEHR. In this evaluation, a patient’s data was labeled as ‘unrealistic’ if at least one clinician found it to be unrealistic. This evaluation showed that existing real EHR data and ScoEHR generated synthetic data were scored as equally realistic. Our code is available at https://github.com/aanaseer/ ScoEHR.
Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Publication website:
https://proceedings.mlr.press/v219/naseer23a.html

Authors

More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Mathematical Institute
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Mathematical Institute
Role:
Author
ORCID:
0000-0002-9574-973X


Publisher:
Proceedings of Machine Learning Research
Host title:
Proceedings of the 8th Machine Learning for Healthcare Conference
Volume:
219
Pages:
489-508
Publication date:
2023-12-22
Event title:
8th Machine Learning for Healthcare Conference (MLHC 2023)
Event location:
New York, New York, USA
Event website:
https://www.mlforhc.org/2023-agenda
Event start date:
2023-08-11
Event end date:
2023-08-12
ISSN:
2640-3498


Language:
English
Pubs id:
1710459
Local pid:
pubs:1710459
Deposit date:
2024-04-29
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP