Journal article icon

Journal article

Application of machine learning with MALDI-TOF MS for rapid differentiation between methicillin-susceptible and methicillin-resistant Staphylococcus aureus

Abstract:
Background: Application of machine learning with matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry may allow rapid differentiation between methicillin-susceptible (MSSA) and methicillin-resistant Staphylococcus aureus (MRSA) and enable earlier AST-guided antibiotic use, but prior studies saw limited model performance. This study aims to apply novel machine learning techniques to a large dataset to create a prediction model with potential for clinical applications. Methods: This study has employed one of the largest datasets to date. 24487 Staphylococcus aureus isolates (13776 MRSA and 10711 MSSA) were collected between Jan 2021 and May 2024 in Hong Kong. These spectra were randomly divided into an 80:20 training-validation split to develop models of various structures. Top models, including a large-scale neural network (NN), the LightGBM gradient boosting framework (LGBM), and the weight-averaging ensemble model (“ensemble”) of NN and LGBM, underwent prospective testing using 2975 additional clinical isolates (1867 MRSA and 1108 MSSA), and external validation using 1000 spectra (500 MRSA and 500 MSSA) from Taiwan. Results: The NN, LGBM, and ensemble models all achieved high performance with accuracy of 0.9284-0.9388 and AUPRC of 0.9843-0.9866 during prospective testing. The models are well-calibrated and confidence thresholds increased the accuracy to 0.9697-0.9777 by rejecting 20% of low-confidence predictions. External validation revealed accuracy of 0.695-0.723 and AUPRC of 0.8409-0.8765 with an increased number of false negatives. Shapley additive explanations revealed top feature groups consistent with previous studies, but feature importance was found to be geographically specific. Conclusions: We present new machine learning models with high performance in differentiating between MRSA and MSSA. Model performance can be further boosted with confidence thresholds, but models are not generalizable across different geographical areas. Clinical applications should use geographically specific models with fallback to traditional AST methods for low confidence predictions.
Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Files:
Publisher copy:
10.1371/journal.pcbi.1013760

Authors

More by this author
Role:
Author
ORCID:
0000-0002-9491-9020
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Statistics
Sub department:
Statistics
Role:
Author


Publisher:
Public Library of Science
Journal:
PLoS Computational Biology More from this journal
Volume:
22
Issue:
5
Pages:
e1013760
Article number:
e1013760
Publication date:
2026-05-05
Acceptance date:
2025-11-17
DOI:
EISSN:
1553-7358
ISSN:
1553734X, 1553-734X


Language:
English
Keywords:
Source identifiers:
4038145
Deposit date:
2026-05-12
ARK identifier:
This ORA record was generated from metadata provided by an external service. It has not been edited by the ORA Team.

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP