Benchmarking Large Language Models Using a Best Evidence Topic Report in a Patient With Early Non-Small Cell Lung Cancer

Chaudhuri, V; Brunelli, A; Tcherveniakov, P; Chaudhuri, N

AI Collection

Journal article : Review

Benchmarking Large Language Models Using a Best Evidence Topic Report in a Patient With Early Non-Small Cell Lung Cancer

Abstract:: Objectives: Large language models (LLMs) are generative-AI which generate text output like a human conversation. We wanted to assess the ability of LLMs to answer patient’s questions and benchmark their output using a best evidence topic (BET). Methods: We asked LLMs whether robot-assisted thoracic surgery (RATS) or video-assisted thoracoscopic surgery (VATS) lobectomy had better perioperative outcomes for postoperative pain, length of hospital stay (LOS) and mortality. A BET was constructed according to a structured protocol for the same questions. An initial search yielded 324 papers, 12 represented the best evidence. Results: LLM outputs are almost instantaneous while a BET took many hours of searching a database for relevant evidence. However, current iterations and models of LLMs did not provide relevant outputs, suffered from hallucinations, and could be restricted by copyright and paywall issues. The BET, on the other hand, was tailored to the scenario by specialist human oversight and therefore more reliable and nuanced. Conclusions: There were no major differences between RATS and VATS lobectomy for T1cN0M0 NSCLC apart from shorter LOS following RATS. Current LLMs may not be entirely reliable for answering clinical questions. An LLM-BET protocol could be used as a standardized process to compare LLM outputs for different clinical scenarios, each benchmarked with a BET. It can also be used to analyse outputs of different models of current and future LLMs.

Publication status:: Published

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Chaudhuri, V., Brunelli, A., Tcherveniakov, P., & Chaudhuri, N. (2026). Benchmarking Large Language Models Using a Best Evidence Topic Report in a Patient With Early Non-Small Cell Lung Cancer. Interdisciplinary Cardiovascular and Thoracic Surgery, 41(2).

MLA Style

Chaudhuri, V, et al. “Benchmarking Large Language Models Using a Best Evidence Topic Report in a Patient With Early Non-Small Cell Lung Cancer.” Interdisciplinary Cardiovascular and Thoracic Surgery, vol. 41, no. 2, 2026.

Chicago Style

Chaudhuri, V, A Brunelli, P Tcherveniakov, and N Chaudhuri. 2026. “Benchmarking Large Language Models Using a Best Evidence Topic Report in a Patient With Early Non-Small Cell Lung Cancer.” Interdisciplinary Cardiovascular and Thoracic Surgery 41 (2).
Print

Access Document

Files:: Chaudhuri_et_al_2026_Benchmarking_Large_Language.pdf

(Preview, Version of record, pdf, 1.2MB, Terms of use)

Publisher copy:: 10.1093/icvts/ivag038

Authors

+ Chaudhuri, V More by this author

Institution:: University of Oxford
Role:: Author
ORCID:: 0009-0005-7670-560X

+ Brunelli, A More by this author

Role:: Author

+ Tcherveniakov, P More by this author

Role:: Author

+ Chaudhuri, N More by this author

Role:: Author
ORCID:: 0000-0003-4204-6923

Publisher:: Oxford University Press
Journal:: Interdisciplinary Cardiovascular and Thoracic Surgery More from this journal
Volume:: 41
Issue:: 2
Article number:: ivag038
Publication date:: 2026-02-06
Acceptance date:: 2026-01-18
DOI:: 10.1093/icvts/ivag038
EISSN:: 2753-670X
ISSN:: 2753-670X

Language:: English
Keywords:: NSCLC—non-small cell lung cancer

ChatGPT

Gemini

Grok

Microsoft Copilot

VATS—video-assisted thoracoscopic surgery

RATS—robotic-assisted thoracoscopic surgery
Subtype:: Review
Source identifiers:: 3787911
Deposit date:: 2026-02-23
ARK identifier:: ark:/29072/ora_279cbfa900db480e8b9d4b4f220164ef

Terms of use

Licence:: CC Attribution (CC BY)

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Journal article : Review

Benchmarking Large Language Models Using a Best Evidence Topic Report in a Patient With Early Non-Small Cell Lung Cancer

Actions

Access Document

Authors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Journal article : Review

Benchmarking Large Language Models Using a Best Evidence Topic Report in a Patient With Early Non-Small Cell Lung Cancer

Actions

Access Document

Authors

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions