Can general purpose large language models assist pediatricians in predicting infants with serious bacterial infection?

Šimunović, I; Rezić, K; Franić, N; Boduljak, G; Batinić, M; Jukić, I; Jelovina, I; Biočić, J; Pogorelić, Z; Markić, J

AI Collection

Journal article

Can general purpose large language models assist pediatricians in predicting infants with serious bacterial infection?

Abstract:: Background: Serious Bacterial Infection (SBI) in neonates and young infants often exhibit nonspecific symptoms and clinical signs in the early stages of illness, making early diagnosis challenging. Timely recognition and appropriate treatment are essential to prevent adverse outcomes. While several clinical algorithms are widely used for SBI risk stratification, these tools have limitations, particularly low positive predictive value. This study evaluates the diagnostic accuracy of general-purpose large language models (LLMs) in detecting SBI in neonates and infants under 90 days of age admitted to the emergency department. Our objective is to improve diagnostic precision, reduce unnecessary interventions, and enhance patient outcomes. LLM performance was compared against traditional machine learning models, state-of-the-art rule-based methods, and an ensemble of physicians to assess their potential as clinical decision-support tools in scenarios of diagnostic uncertainty. Results: On a dataset of 742 patients, LLMs demonstrated diagnostic accuracy comparable to traditional machine learning models and state-of-the-art rule-based methods. The optimized CatBoost (class-weighted) model achieved the best overall performance, with a PPV of 0.70, NPV of 0.90, sensitivity of 0.54, specificity of 0.95, F1-score of 0.60, and MCC of 0.54, outperforming the baseline CatBoost model and achieving results on par with large language models (LLMs) and physicians. When optimally prompted, LLMs performed on par with ensembles of experienced clinicians. Additionally, LLMs exhibited effective medical reasoning and provided credible diagnostic predictions, particularly valuable in cases of clinician uncertainty. The models achieved balanced performance across multiple evaluation metrics, including PPV, NPV, sensitivity, specificity, F1-score, and Matthew’s correlation coefficient (MCC). ChatGPT-4o achieved a sensitivity of 0.65 and specificity of 0.83, with an MCC of 0.41. Claude Sonnet 3.5 reached a sensitivity of 0.60 and specificity of 0.86, MCC 0.42 and Google Gemini 2.0 Flash had lower sensitivity (0.43) but the highest specificity (0.94), with an MCC of 0.43. In comparison, the best-performing individual pediatrician achieved a higher sensitivity (0.74) but lower specificity (0.68), with an MCC of 0.33, while the pediatricians’ majority vote yielded sensitivity of 0.69, specificity of 0.81, and MCC of 0.43 — comparable to the top-performing LLMs. Conclusions: These Artificial intelligence tools offer a promising direction for SBI risk prediction, achieving performance comparable to that of experienced pediatric specialists, while maintaining simplicity of use/data-preprocessing for potential real-world applications.

Publication status:: Published

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Šimunović, I., Rezić, K., Franić, N., Boduljak, G., Batinić, M., Jukić, I., Jelovina, I., Biočić, J., Pogorelić, Z., & Markić, J. (2025). Can general purpose large language models assist pediatricians in predicting infants with serious bacterial infection? BMC Medical Informatics and Decision Making, 25(1).

MLA Style

Šimunović, I, et al. “Can General Purpose Large Language Models Assist Pediatricians in Predicting Infants with Serious Bacterial Infection?” BMC Medical Informatics and Decision Making, vol. 25, no. 1, 2025.

Chicago Style

Šimunović, I, K Rezić, N Franić, et al. 2025. “Can General Purpose Large Language Models Assist Pediatricians in Predicting Infants with Serious Bacterial Infection?” BMC Medical Informatics and Decision Making 25 (1).
Print

Access Document

Files:: Simunovic_et_al_2025_Can_general_purpose.pdf

(Preview, Version of record, pdf, 1.2MB, Terms of use)

Publisher copy:: 10.1186/s12911-025-03258-3

Authors

+ Šimunović, I More by this author

Role:: Author

+ Rezić, K More by this author

Role:: Author

+ Franić, N More by this author

Role:: Author

+ Boduljak, G More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Computer Science
Sub department:: Computer Science
Role:: Author

+ Batinić, M More by this author

Role:: Author

More authors...

Publisher:: BioMed Central
Journal:: BMC Medical Informatics and Decision Making More from this journal
Volume:: 25
Issue:: 1
Article number:: 423
Publication date:: 2025-11-14
Acceptance date:: 2025-10-22
DOI:: 10.1186/s12911-025-03258-3
EISSN:: 1472-6947
ISSN:: 1472-6947

Language:: English
Keywords:: Pediatrics

Diagnostics

Large language models

Serious bacterial infection

Prediction

Infectology

Machine learning
Pubs id:: 2350293
UUID:: uuid_fc425c13-fb5e-4008-ab69-157c70ff56cb
Local pid:: pubs:2350293
Source identifiers:: 3475850
Deposit date:: 2025-11-15
ARK identifier:: ark:/29072/ora_fc425c13fb5e4008ab69157c70ff56cb

Terms of use

Licence:: CC Attribution-NonCommercial-NoDerivatives (CC BY-NC-ND)

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Journal article

Can general purpose large language models assist pediatricians in predicting infants with serious bacterial infection?

Actions

Access Document

Authors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Journal article

Can general purpose large language models assist pediatricians in predicting infants with serious bacterial infection?

Actions

Access Document

Authors

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions