AI Collection

Conference item : Poster

Automated extraction of artificial intelligence model and dataset characteristics from papers to promote transparency

Abstract:: We demonstrate that large language models can accurately extract structured model and dataset characteristics from AI research papers using the ROADMAP ontology. Using 10 benchmark papers and subsequently scaling to 311 publications, GPT-5 produced the highest-fidelity structured outputs at low cost. This enables large-scale aggregation of models, datasets, metrics, and content codes, supporting reproducibility, discoverability, and transparency. Structured outputs are publicly accessible through the ATLAS online repository.

Publication status:: Accepted

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Suri, A., Gonzales, R. A., Takahashi, M. S., & Kahn, C. E. (2026). Automated extraction of artificial intelligence model and dataset characteristics from papers to promote transparency. Amplify Informatics Conference (Amplify 2026).

MLA Style

Suri, A, et al. “Automated Extraction of Artificial Intelligence Model and Dataset Characteristics from Papers to Promote Transparency.” Amplify Informatics Conference (Amplify 2026), 2026.

Chicago Style

Suri, A, RA Gonzales, MS Takahashi, and CE Kahn. 2026. “Automated Extraction of Artificial Intelligence Model and Dataset Characteristics from Papers to Promote Transparency.” In Amplify Informatics Conference (Amplify 2026). American Medical Informatics Association.
Print

Access Document

Files:: Suri_et_al_2026_Automated_extraction_of.pdf

(Preview, Accepted manuscript, pdf, 464.4KB, Terms of use)

Authors

+ Suri, A More by this author

Role:: Author

+ Gonzales, RA More by this author

Institution:: University of Oxford
Division:: MSD
Department:: Radcliffe Department of Medicine
Oxford college:: Balliol College
Role:: Author
ORCID:: 0000-0002-9384-4602

+ Takahashi, MS More by this author

Role:: Author

+ Kahn, CE More by this author

Role:: Author

Publisher:: American Medical Informatics Association
Article number:: 201
Acceptance date:: 2026-02-19
Event title:: Amplify Informatics Conference (Amplify 2026)
Event location:: Denver, Colorado, USA
Event website:: https://amia.org/education-events/amplify-informatics-conference
Event start date:: 2026-05-18
Event end date:: 2026-05-21

Language:: English
Subtype:: Poster
Pubs id:: 2382513
Local pid:: pubs:2382513
Deposit date:: 2026-02-28
ARK identifier:: ark:/29072/ora_97404f9be1554712a9a322eae75ec10a

Terms of use

Copyright date:: 2026
Notes:: The author accepted manuscript (AAM) of this paper has been made available under the University of Oxford's Open Access Publications Policy, and a CC BY public copyright licence has been applied.

Licence:: CC Attribution (CC BY)

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP