Journal article icon

Journal article

A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models — part 2: time-to-event outcomes

Abstract:
Background: When developing a clinical prediction model using time-to-event data (i.e. with censoring and different lengths of follow-up), previous research focuses on the sample size needed to minimise overfitting and precisely estimating the overall risk. However, instability of individual-level risk estimates may still be large. Methods: We propose using a decomposition of Fisher’s information matrix to help examine and calculate the sample size required for developing a model that aims for precise and fair risk estimates. We propose a six-step process which can be used either before data collection or when an existing dataset is available. Steps 1 to 5 require researchers to specify the overall risk in the target population at a key time-point of interest: an assumed pragmatic ‘core model’ in the form of an exponential regression model, the (anticipated) joint distribution of core predictors included in that model and the distribution of censoring times. The ‘core model’ can be specified directly or based on a specified C-index and relative effects of (standardised) predictors. The joint distribution of predictors may be available directly in an existing dataset, in a pilot study or in a synthetic dataset provided by other researchers. Results: We derive closed-form solutions that decompose the variance of an individual’s estimated event rate into Fisher’s unit information matrix, predictor values and total sample size; this allows researchers to calculate and examine uncertainty distributions around individual risk estimates and misclassification probabilities for specified sample sizes. We provide an illustrative example in breast cancer and emphasise the importance of clinical context, including any risk thresholds for decision-making, and examine fairness concerns for pre- and postmenopausal women. Lastly, in two empirical evaluations, we provide reassurance that uncertainty interval widths based on our exponential approach are close to using more flexible parametric models. Conclusions: Our approach allows users to identify the (target) sample size required to develop a prediction model for time-to-event outcomes, via the pmstabilityss module. It aims to facilitate models with improved trust, reliability and fairness in individual-level predictions.
Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Files:
Publisher copy:
10.1186/s41512-025-00204-9

Authors

More by this author
Institution:
University of Oxford
Division:
MSD
Department:
NDORMS
Sub department:
Centre for Statistics in Medicine
Role:
Author


More from this funder
Funder identifier:
https://ror.org/03x94j517
More from this funder
Funder identifier:
https://ror.org/054225q67


Publisher:
BioMed Central
Journal:
Diagnostic and Prognostic Research More from this journal
Volume:
9
Issue:
1
Article number:
33
Publication date:
2025-12-16
Acceptance date:
2025-06-27
DOI:
EISSN:
2397-7523
ISSN:
2397-7523


Language:
English
Pubs id:
2353839
UUID:
uuid_7df66d0c-a278-4e43-8223-75685a5b3bad
Local pid:
pubs:2353839
Source identifiers:
3573311
Deposit date:
2025-12-17
ARK identifier:
This ORA record was generated from metadata provided by an external service. It has not been edited by the ORA Team.

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP