Conference item : Poster
Automated extraction of artificial intelligence model and dataset characteristics from papers to promote transparency
- Abstract:
- We demonstrate that large language models can accurately extract structured model and dataset characteristics from AI research papers using the ROADMAP ontology. Using 10 benchmark papers and subsequently scaling to 311 publications, GPT-5 produced the highest-fidelity structured outputs at low cost. This enables large-scale aggregation of models, datasets, metrics, and content codes, supporting reproducibility, discoverability, and transparency. Structured outputs are publicly accessible through the ATLAS online repository.
- Publication status:
- Accepted
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 464.4KB, Terms of use)
-
Authors
- Publisher:
- American Medical Informatics Association
- Article number:
- 201
- Acceptance date:
- 2026-02-19
- Event title:
- Amplify Informatics Conference (Amplify 2026)
- Event location:
- Denver, Colorado, USA
- Event website:
- https://amia.org/education-events/amplify-informatics-conference
- Event start date:
- 2026-05-18
- Event end date:
- 2026-05-21
- Language:
-
English
- Subtype:
-
Poster
- Pubs id:
-
2382513
- Local pid:
-
pubs:2382513
- Deposit date:
-
2026-02-28
- ARK identifier:
Terms of use
- Copyright date:
- 2026
- Notes:
- The author accepted manuscript (AAM) of this paper has been made available under the University of Oxford's Open Access Publications Policy, and a CC BY public copyright licence has been applied.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record