Extended sample size calculations for evaluation of prediction models using a threshold for classification

Whittle, R; Ensor, J; Archer, L; Collins, GS; Dhiman, P; Denniston, A; Alderman, J; Legha, A; van Smeden, M; Moons, KG; Cazier, J; Riley, RD; Snell, KIE

Journal article

Extended sample size calculations for evaluation of prediction models using a threshold for classification

Abstract:: When evaluating the performance of a model for individualised risk prediction, the sample size needs to be large enough to precisely estimate the performance measures of interest. Current sample size guidance is based on precisely estimating calibration, discrimination, and net benefit, which should be the first stage of calculating the minimum required sample size. However, when a clinically important threshold is used for classification, other performance measures are also often reported. We extend the previously published guidance to precisely estimate threshold-based performance measures. We have reported closed-form solutions to estimate the sample size required to target sufficiently precise estimates of accuracy, specificity, sensitivity, positive predictive value (PPV), negative predictive value (NPV), and an iterative method to estimate the sample size required to target a sufficiently precise estimate of the F1-score, in an external evaluation study of a prediction model with a binary outcome. This approach requires the user to pre-specify the target standard error and the expected value for each performance measure alongside the outcome prevalence. We describe how the sample size formulae were derived and demonstrate their use in an example. Extension to time-to-event outcomes is also considered. In our examples, the minimum sample size required was lower than that required to precisely estimate the calibration slope, and we expect this would most often be the case. Our formulae, along with corresponding Python code and updated R, Stata and Python commands (pmvalsampsize), enable researchers to calculate the minimum sample size needed to precisely estimate threshold-based performance measures in an external evaluation study. These criteria should be used alongside previously published criteria to precisely estimate the calibration, discrimination, and net-benefit.

Publication status:: Published

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Cite

Cite this record

APA Style

Whittle, R., Ensor, J., Archer, L., Collins, G. S., Dhiman, P., Denniston, A., Alderman, J., Legha, A., van Smeden, M., Moons, K. G., Cazier, J., Riley, R. D., & Snell, K. I. E. (2025). Extended sample size calculations for evaluation of prediction models using a threshold for classification. BMC Medical Research Methodology, 25(1).

MLA Style

Whittle, R., et al. “Extended Sample Size Calculations for Evaluation of Prediction Models Using a Threshold for Classification.” BMC Medical Research Methodology, vol. 25, no. 1, BioMed Central, 2025.

Chicago Style

Whittle, R, J Ensor, L Archer, GS Collins, P Dhiman, A Denniston, J Alderman, et al. 2025. “Extended Sample Size Calculations for Evaluation of Prediction Models Using a Threshold for Classification.” BMC Medical Research Methodology 25 (1).
Share
Print

Access Document

Files:: Whittle_et_al_2025_Extended_sample_size.pdf

(Preview, Version of record, pdf, 1.2MB, Terms of use)

Publisher copy:: 10.1186/s12874-025-02592-4

Authors

+ Whittle, R More by this author

Role:: Author

+ Ensor, J More by this author

Role:: Author

+ Archer, L More by this author

Role:: Author

+ Collins, GS More by this author

Institution:: University of Oxford
Division:: MSD
Department:: NDORMS
Sub department:: Centre for Statistics in Medicine
Role:: Author

+ Dhiman, P More by this author

Institution:: University of Oxford
Division:: MSD
Department:: NDORMS
Sub department:: Centre for Statistics in Medicine
Role:: Author

More authors...

+ Cancer Research UK More from this funder

Funder identifier:: https://ror.org/054225q67

+ Engineering and Physical Sciences Research Council More from this funder

Funder identifier:: https://ror.org/0439y7842

Publisher:: BioMed Central
Journal:: BMC Medical Research Methodology More from this journal
Volume:: 25
Issue:: 1
Article number:: 170
Publication date:: 2025-07-01
Acceptance date:: 2025-05-12
DOI:: 10.1186/s12874-025-02592-4
EISSN:: 1471-2288
ISSN:: 1471-2288

Language:: English
Keywords:: External validation

Performance measures

Threshold

Clinical prediction models

Model evaluation

Classification models

Sample size
Source identifiers:: 3070126
Deposit date:: 2025-07-01

Terms of use

Licence:: CC Attribution (CC BY)

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Journal article

Extended sample size calculations for evaluation of prediction models using a threshold for classification

Actions

Access Document

Authors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Journal article

Extended sample size calculations for evaluation of prediction models using a threshold for classification

Actions

Access Document

Authors

Funding

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions