Conference item
LLM-EmBEditor: universal base editing efficiency prediction via large language model embeddings
- Abstract:
-
Base editing efficiency varies dramatically across experiments, creating critical bottlenecks for therapeutic applications. While current specialized computational models achieve good performance on individual base editor types, they require extensive feature engineering, custom architectures, and editor-specific optimization that limit practical applicability across diverse base editing systems. We introduce LLM-EmBEditor, a holistic base editing efficiency rate prediction model that leverages Large Language Models (LLMs). By encoding base editing features as comma-separated key-value strings, our method extracts rich contextual embeddings and trains lightweight regression heads, eliminating both specialized architectures and manual feature engineering while seamlessly integrating sequence and numerical features through natural language formatting.
To our knowledge, our model, LLM-EmBEditor, is the first to successfully leverage mixed base editor datasets, enabling effective transfer learning across different base editor types (ABE and CBE), while existing models are limited to a single editor type. This cross-editor generalization capability allows LLM-EmBEditor to achieve strong performance, outperforming traditional benchmarks by 13.5% in Pearsonโs R and 18.8% in Spearmanโs ๐ on the ABE combined dataset (Pearsonโs ๐ = 0.717, Spearmanโs ๐ = 0.797 vs. ๐ = 0.632, ๐ = 0.671 for FORECasT-BE) and achieving competitive performance on the CBE combined dataset (๐ = 0.662, ๐ = 0.689 vs. ๐ = 0.739, ๐ = 0.816 for igRNA-ABE), while uniquely providing predictions across all editor types with ๐ = 0.705, ๐ = 0.775, and ๐ 2 = 0.496. Our ablation study confirms the robustness of our approach across multiple model components, such as transformer architecture, pooling strategy, and model size.
- Publication status:
- Accepted
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 3.0MB, Terms of use)
-
Authors
- Publisher:
- Association for Computing Machinery
- Series:
- International Conference Proceedings by ACM
- Acceptance date:
- 2025-11-18
- Event title:
- 9th International Conference on Computational Biology and Bioinformatics (ICCBB 2025)
- Event location:
- Yokyo, Japan
- Event website:
- https://www.iccbb.org/
- Event start date:
- 2025-12-21
- Event end date:
- 2025-12-23
- Language:
-
English
- Keywords:
- Pubs id:
-
2368988
- Local pid:
-
pubs:2368988
- Deposit date:
-
2026-02-08
- ARK identifier:
Terms of use
- Copyright holder:
- Schneider and Minary
- Copyright date:
- 2025
- Notes:
-
This paper was presented at the 9th International Conference on Computational Biology and Bioinformatics (ICCBB 2025), 21-23 December 2025, Tokyo, Japan. This is the accepted manuscript version of the paper. The final version is forthcoming from the Association for Computing Machinery.
For the purpose of Open Access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript (AAM) version arising from this submission.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record