Conference item icon

Conference item

LLM-EmBEditor: universal base editing efficiency prediction via large language model embeddings

Abstract:
Base editing efficiency varies dramatically across experiments, creating critical bottlenecks for therapeutic applications. While current specialized computational models achieve good performance on individual base editor types, they require extensive feature engineering, custom architectures, and editor-specific optimization that limit practical applicability across diverse base editing systems. We introduce LLM-EmBEditor, a holistic base editing efficiency rate prediction model that leverages Large Language Models (LLMs). By encoding base editing features as comma-separated key-value strings, our method extracts rich contextual embeddings and trains lightweight regression heads, eliminating both specialized architectures and manual feature engineering while seamlessly integrating sequence and numerical features through natural language formatting.

To our knowledge, our model, LLM-EmBEditor, is the first to successfully leverage mixed base editor datasets, enabling effective transfer learning across different base editor types (ABE and CBE), while existing models are limited to a single editor type. This cross-editor generalization capability allows LLM-EmBEditor to achieve strong performance, outperforming traditional benchmarks by 13.5% in Pearsonโ€™s R and 18.8% in Spearmanโ€™s ๐œŒ on the ABE combined dataset (Pearsonโ€™s ๐‘… = 0.717, Spearmanโ€™s ๐œŒ = 0.797 vs. ๐‘… = 0.632, ๐œŒ = 0.671 for FORECasT-BE) and achieving competitive performance on the CBE combined dataset (๐‘… = 0.662, ๐œŒ = 0.689 vs. ๐‘… = 0.739, ๐œŒ = 0.816 for igRNA-ABE), while uniquely providing predictions across all editor types with ๐‘… = 0.705, ๐œŒ = 0.775, and ๐‘…2 = 0.496. Our ablation study confirms the robustness of our approach across multiple model components, such as transformer architecture, pooling strategy, and model size.
Publication status:
Accepted
Peer review status:
Peer reviewed

Actions

Access Document

Authors

More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Oxford college:
Keble College
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Role:
Author
ORCID:
0000-0002-1779-6741


Publisher:
Association for Computing Machinery
Series:
International Conference Proceedings by ACM
Acceptance date:
2025-11-18
Event title:
9th International Conference on Computational Biology and Bioinformatics (ICCBB 2025)
Event location:
Yokyo, Japan
Event website:
https://www.iccbb.org/
Event start date:
2025-12-21
Event end date:
2025-12-23

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP