Conference item icon

Conference item

Navigating severe class imbalance in population cohort data

Abstract:
Class imbalance is a major challenge in predictive modelling for rare disease outcomes, particularly in large-scale population cohorts. Traditional machine learning models often struggle with imbalanced datasets, leading to biased performance metrics and poor generalisability. This study systematically evaluates multiple approaches to mitigate class imbalance in predicting Multiple myeloma using proteomic and clinical data from UK Biobank. We compare standard classification models (XGBoost and logistic regression) with synthetic resampling (SMOTE), anomaly detection techniques (isolation forests, local outlier factors, one-class SVM, and autoencoders), and a transformer-based foundation model (TabPFN), using standard classification performance metrics. Our results indicate that anomaly detection models generalise better than conventional classifiers (XGBoost and logistic regression), while SMOTE fails to improve, and may actively worsen, predictive performance. To address the precision-sensitivity trade-off, we introduce a sequential XGBoost ensemble classifier (SeqXGB) that prioritises high precision over sensitivity to minimise false positive predictions. Compared with a single XGBoost model, the SeqXGB approach successfully reduces false positives (420 vs 9), but significantly limits sensitivity (0.70 vs 0.15) in held-out test data. Our findings highlight that no single method is universally optimal for addressing class imbalance; rather, model selection should be guided by clinical application, balancing the risks of false positives and false negatives.
Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Files:
Publisher copy:
10.1109/EMBC58623.2025.11254293

Authors

More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Oxford college:
Jesus College
Role:
Author
ORCID:
0000-0002-3116-218X
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
ORCID:
0000-0001-5313-4596
More by this author
Institution:
University of Oxford
Division:
MSD
Department:
Women's & Reproductive Health
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
ORCID:
0000-0002-7006-1947
More by this author
Institution:
University of Oxford
Division:
MSD
Department:
Primary Care Health Sciences
Role:
Author


More from this funder
Funder identifier:
https://ror.org/0187kwz08
Grant:
NIHR302440


Publisher:
IEEE
Publication date:
2025-12-03
Acceptance date:
2025-04-08
Event title:
47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2025)
Event location:
Copenhagen, Denmark
Event website:
https://embc.embs.org/2025/
Event start date:
2025-07-14
Event end date:
2025-07-17
DOI:
EISSN:
2694-0604
ISSN:
2375-7477


Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP