Conference item icon

Conference item : Poster

Automated extraction of artificial intelligence model and dataset characteristics from papers to promote transparency

Abstract:
We demonstrate that large language models can accurately extract structured model and dataset characteristics from AI research papers using the ROADMAP ontology. Using 10 benchmark papers and subsequently scaling to 311 publications, GPT-5 produced the highest-fidelity structured outputs at low cost. This enables large-scale aggregation of models, datasets, metrics, and content codes, supporting reproducibility, discoverability, and transparency. Structured outputs are publicly accessible through the ATLAS online repository.
Publication status:
Accepted
Peer review status:
Peer reviewed

Actions

Access Document

Files:

Authors

More by this author
Institution:
University of Oxford
Division:
MSD
Department:
Radcliffe Department of Medicine
Oxford college:
Balliol College
Role:
Author
ORCID:
0000-0002-9384-4602


Publisher:
American Medical Informatics Association
Article number:
201
Acceptance date:
2026-02-19
Event title:
Amplify Informatics Conference (Amplify 2026)
Event location:
Denver, Colorado, USA
Event website:
https://amia.org/education-events/amplify-informatics-conference
Event start date:
2026-05-18
Event end date:
2026-05-21


Language:
English
Subtype:
Poster
Pubs id:
2382513
Local pid:
pubs:2382513
Deposit date:
2026-02-28
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP