LLM-EmBEditor: universal base editing efficiency prediction via
large language model embeddings

Schneider, L; Minary, P

AI Collection

Conference item

LLM-EmBEditor: universal base editing efficiency prediction via large language model embeddings

Abstract:: Base editing efficiency varies dramatically across experiments, creating critical bottlenecks for therapeutic applications. While current specialized computational models achieve good performance on individual base editor types, they require extensive feature engineering, custom architectures, and editor-specific optimization that limit practical applicability across diverse base editing systems. We introduce LLM-EmBEditor, a holistic base editing efficiency rate prediction model that leverages Large Language Models (LLMs). By encoding base editing features as comma-separated key-value strings, our method extracts rich contextual embeddings and trains lightweight regression heads, eliminating both specialized architectures and manual feature engineering while seamlessly integrating sequence and numerical features through natural language formatting.

To our knowledge, our model, LLM-EmBEditor, is the first to successfully leverage mixed base editor datasets, enabling effective transfer learning across different base editor types (ABE and CBE), while existing models are limited to a single editor type. This cross-editor generalization capability allows LLM-EmBEditor to achieve strong performance, outperforming traditional benchmarks by 13.5% in Pearson’s R and 18.8% in Spearman’s 𝜌 on the ABE combined dataset (Pearson’s 𝑅 = 0.717, Spearman’s 𝜌 = 0.797 vs. 𝑅 = 0.632, 𝜌 = 0.671 for FORECasT-BE) and achieving competitive performance on the CBE combined dataset (𝑅 = 0.662, 𝜌 = 0.689 vs. 𝑅 = 0.739, 𝜌 = 0.816 for igRNA-ABE), while uniquely providing predictions across all editor types with 𝑅 = 0.705, 𝜌 = 0.775, and 𝑅² = 0.496. Our ablation study confirms the robustness of our approach across multiple model components, such as transformer architecture, pooling strategy, and model size.

Publication status:: Accepted

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Schneider, L., & Minary, P. (2025). LLM-EmBEditor: universal base editing efficiency prediction via large language model embeddings. 9th International Conference on Computational Biology and Bioinformatics (ICCBB 2025).

MLA Style

Schneider, L, and P Minary. “LLM-EmBEditor: Universal Base Editing Efficiency Prediction via Large Language Model Embeddings.” 9th International Conference on Computational Biology and Bioinformatics (ICCBB 2025), International Conference Proceedings by ACM, 2025.

Chicago Style

Schneider, L, and P Minary. 2025. “LLM-EmBEditor: Universal Base Editing Efficiency Prediction via Large Language Model Embeddings.” In 9th International Conference on Computational Biology and Bioinformatics (ICCBB 2025). International Conference Proceedings by ACM. Association for Computing Machinery.
Print

Access Document

Files:: Minary_and_Schneider_2025_LLM-EmBEditor_universal_base.pdf

(Preview, Accepted manuscript, pdf, 3.0MB, Terms of use)

Authors

+ Schneider, L More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Computer Science
Oxford college:: Keble College
Role:: Author

+ Minary, P More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Computer Science
Role:: Author
ORCID:: 0000-0002-1779-6741

Publisher:: Association for Computing Machinery
Series:: International Conference Proceedings by ACM
Acceptance date:: 2025-11-18
Event title:: 9th International Conference on Computational Biology and Bioinformatics (ICCBB 2025)
Event location:: Yokyo, Japan
Event website:: https://www.iccbb.org/
Event start date:: 2025-12-21
Event end date:: 2025-12-23

Language:: English
Keywords:: base editing

large language models

embedding-based regression

LLM embeddings

transformer architectures

pooling strategies

base editing efficiency prediction

machine learning
Pubs id:: 2368988
Local pid:: pubs:2368988
Deposit date:: 2026-02-08
ARK identifier:: ark:/29072/ora_68c4f07dcbda49a292cabc6c8556ab56

Terms of use

Copyright holder:: Schneider and Minary
Notes:: This paper was presented at the 9th International Conference on Computational Biology and Bioinformatics (ICCBB 2025), 21-23 December 2025, Tokyo, Japan. This is the accepted manuscript version of the paper. The final version is forthcoming from the Association for Computing Machinery.
For the purpose of Open Access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript (AAM) version arising from this submission.

Licence:: CC Attribution (CC BY)

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Conference item

LLM-EmBEditor: universal base editing efficiency prediction via large language model embeddings

Actions

Access Document

Authors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Conference item

LLM-EmBEditor: universal base editing efficiency prediction via large language model embeddings

Actions

Access Document

Authors

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions