Journal article icon

Journal article

Transfer Learning for Heterocycle Retrosynthesis

Abstract:
Heterocycles are important scaffolds in medicinal chemistry that can be used to modulate the binding mode as well as the pharmacokinetic properties of drugs. The importance of heterocycles has been exemplified by the publication of numerous data sets containing heterocyclic rings and their properties. However, those data sets lack synthetic routes toward the published heterocycles. Consequently, novel and uncommon heterocycles are not easily synthetically accessible. While retrosynthetic prediction models could usually be used to assist synthetic chemists, their performance is poor for heterocycle formation reactions due to low data availability. In this work, we compare the use of four different transfer learning methods to overcome the low data availability problem and improve the performance of retrosynthesis prediction models for ring-breaking disconnections. The mixed fine-tuned model achieves top-1 accuracy of 36.5%, and, moreover, 62.1% of its predictions are chemically valid and ring-breaking. Furthermore, we demonstrate the applicability of the mixed fine-tuned model in drug discovery by recreating synthetic routes toward two drug-like targets published in 2023. Finally, we introduce a method for further fine-tuning the model as new reaction data becomes available.
Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Files:
Publisher copy:
10.1021/acs.jcim.4c02041

Authors


More by this author
Institution:
University of Oxford
Division:
MSD
Department:
NDM
Sub department:
CMD
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MSD
Department:
NDM
Sub department:
CMD
Role:
Author
ORCID:
0000-0003-2322-4384


More from this funder
Funder identifier:
https://ror.org/02ymzm013


Publisher:
American Chemical Society
Journal:
Journal of Chemical Information and Modeling More from this journal
Volume:
65
Issue:
15
Pages:
7851-7861
Publication date:
2025-07-29
Acceptance date:
2025-07-01
DOI:
EISSN:
1549-960X
ISSN:
1549-9596


Language:
English
Source identifiers:
3193268
Deposit date:
2025-08-12
This ORA record was generated from metadata provided by an external service. It has not been edited by the ORA Team.

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP