Preprint
Democratising clinical AI through dataset condensation for classical clinical models
- Abstract:
- Dataset condensation (DC) learns a compact synthetic dataset that enables models to match the performance of full-data training, prioritising utility over distributional fidelity. While typically explored for computational efficiency, DC also holds promise for healthcare data democratisation, especially when paired with differential privacy, allowing synthetic data to serve as a safe alternative to real records. However, existing DC methods rely on differentiable neural networks, limiting their compatibility with widely used clinical models such as decision trees and Cox regression. We address this gap using a differentially private, zero-order optimisation framework that extends DC to non-differentiable models using only function evaluations. Empirical results across six datasets, including both classification and survival tasks, show that the proposed method produces condensed datasets that preserve model utility while providing effective differential privacy guarantees—enabling model-agnostic data sharing for clinical prediction tasks without exposing sensitive patient information.
- Publication status:
- Published
- Peer review status:
- Not peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Pre-print, pdf, 6.5MB, Terms of use)
-
- Preprint server copy:
- 10.48550/arXiv.2603.09356
Authors
+ National Institute for Health and Care Research
More from this funder
- Funder identifier:
- https://ror.org/0187kwz08
- Preprint server:
- arXiv
- Publication date:
- 2026-03-10
- DOI:
- Language:
-
English
- Pubs id:
-
2393327
- Local pid:
-
pubs:2393327
- Deposit date:
-
2026-05-13
- ARK identifier:
Terms of use
- Copyright holder:
- Thakur et al.
- Copyright date:
- 2026
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record