Preprint
OptimusKG: unifying biomedical knowledge in a modern multimodal graph
- Abstract:
- Biomedical knowledge graphs (KGs) are widely used in the life sciences, yet many are derived from unstructured documents and therefore lack schema-level constrains, whereas graphs assembled from structured resources are difficult to harmonize into a unified representation. We present OptimusKG, a multimodal biomedical labeled property graph (LPG) built from structured and semi-structured resources to preserve factual, type-specific metadata across molecular, anatomical, clinical, and environmental domains. OptimusKG contains 190,531 nodes across 10 entity types, 21,813,816 edges across 26 relation types, and 67,249,863 property instances encoding 110,276,843 values across 150 distinct property keys, derived from 18 ontologies and controlled vocabularies. The graph enforces a top-level schema for nodes and edges and retains granular, type-specific properties, cross-references, and provenance across molecular, anatomical, clinical, and environmental domains. We assessed the validity of OptimusKG by evaluating whether graph relationships are supported by evidence from the scientific literature using a multimodal agent, PaperQA3. PaperQA3 identified supporting evidence for 70.0% of sampled edges, whereas 83.4% of sampled false edges received no supporting evidence. Edges without literature support were concentrated in associations derived from experimental and functional genomics resources, suggesting that OptimusKG captures biomedical knowledge that may precede synthesis in the scientific literature. OptimusKG is distributed as Apache Parquet files, providing a standardized resource for graph-based machine learning, knowledge-grounded retrieval with large language models, and biomedical discovery use cases such as hypothesis generation.
- Publication status:
- Published
- Peer review status:
- Not peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Pre-print, pdf, 8.8MB, Terms of use)
-
- Preprint server copy:
- 10.48550/arXiv.2604.27269
Authors
- Preprint server:
- arXiv
- Publication date:
- 2026-04-29
- DOI:
- EISSN:
-
2331-8422
- Language:
-
English
- Pubs id:
-
2433694
- Local pid:
-
pubs:2433694
- Source identifiers:
-
W7159592703
- Deposit date:
-
2026-06-15
- ARK identifier:
Terms of use
- Copyright holder:
- Vittor et al
- Copyright date:
- 2026
- Rights statement:
- ©2026 The Authors. This paper is an open access article distributed under the terms of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/)
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record