Journal article icon

Journal article : Review

Larger sample sizes are needed when developing a clinical prediction model using machine learning in oncology: methodological systematic review

Abstract:
Background and Objectives
Having a sufficient sample size is crucial when developing a clinical prediction model. We reviewed details of sample size in studies developing prediction models for binary outcomes using machine learning (ML) methods within oncology and compared the sample size used to develop the models with the minimum required sample size needed when developing a regression-based model (Nmin).
Methods
We searched the Medline (via OVID) database for studies developing a prediction model using ML methods published in December 2022. We reviewed how sample size was justified. We calculated Nmin, which is the Nmin, and compared this with the sample size that was used to develop the models.
Results
Only one of 36 included studies justified their sample size. We were able to calculate Nmin for 17 (47%) studies. 5/17 studies met Nmin, allowing to precisely estimate the overall risk and minimize overfitting. There was a median deficit of 302 participants with the event (n = 17; range: −21,331 to 2298) when developing the ML models. An additional three out of the 17 studies met the required sample size to precisely estimate the overall risk only.
Conclusion
Studies developing a prediction model using ML in oncology seldom justified their sample size and sample sizes were often smaller than Nmin. As ML models almost certainly require a larger sample size than regression models, the deficit is likely larger. We recommend that researchers consider and report their sample size and at least meet the minimum sample size required when developing a regression-based model.
Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Files:
Publisher copy:
10.1016/j.jclinepi.2025.111675

Authors

More by this author
Institution:
University of Oxford
Division:
MSD
Department:
NDORMS
Sub department:
Botnar Institute for Musculoskeletal Sciences
Role:
Author
ORCID:
0000-0002-3217-1512
More by this author
Role:
Author
ORCID:
0000-0001-9373-6591
More by this author
Institution:
University of Oxford
Division:
MSD
Department:
NDORMS
Sub department:
Botnar Institute for Musculoskeletal Sciences
Role:
Author
ORCID:
0000-0002-7801-5777


More from this funder
Funder identifier:
https://ror.org/0439y7842
Grant:
EP/Y018516/1
More from this funder
Funder identifier:
https://ror.org/05f950310


Publisher:
Elsevier
Journal:
Journal of Clinical Epidemiology More from this journal
Volume:
180
Article number:
111675
Place of publication:
United States
Publication date:
2025-01-12
Acceptance date:
2025-01-07
DOI:
EISSN:
1878-5921
ISSN:
0895-4356
Pmid:
39814217


Language:
English
Keywords:
Subtype:
Review
Pubs id:
2078785
Local pid:
pubs:2078785
Deposit date:
2025-03-17
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP