Machine learning for retrosynthesis and synthesisable molecule generation in drug discovery

Wieczorek, E

Thesis

Machine learning for retrosynthesis and synthesisable molecule generation in drug discovery

Abstract:: Drug discovery is a notoriously difficult and slow process, with high research and development costs and a decreasing success rate. Computer-Aided Drug Design methods show promise in improving the efficiency of early stage drug discovery, increasing the number of compounds that can be evaluated per design cycle and allowing for pre-filtering of molecules with fast computational methods before they are synthesised. However, many of the compounds designed in silico are not synthesisable in practice or the synthesis routes towards them are not obvious. This leads to computational resources being wasted on designing molecules that can never be tested experimentally. This thesis explores new methods for two approaches assessing and improving synthesisability in drug discovery: retrosynthesis prediction and synthesisability-constrained molecule generation.

First, the problem of retrosynthesis prediction for molecules containing heterocyclic scaffolds is considered. Four domain adaptation approaches are benchmarked to develop a single-step retrosynthesis prediction model with improved performance for ring disconnections. Accuracy for heterocycle formations and all reaction classes, as well as computational cost, are considered. A further fine-tuning workflow for continual retraining of the model with newly published data is introduced. The application of the most versatile model, trained with a mixed fine-tuning strategy, is then demonstrated in multi-step retrosynthesis in a retrospective analysis for two drug-like compounds.

Next, the development of retro-active, a method for synthesisable molecule generation and optimisation, is described. Retro-active generates molecules based on a known synthesis route and a provided starting material pool. The use of active learning for starting material selection allows for the optimisation of the resulting product molecules for user-defined scoring functions. A benchmark of starting material acquisition and product enumeration methods is included, as well as a comparison to alternative non-machine learning-based starting material selection approaches. The applicability of retro-active for both ligand-based and structure-based drug discovery is demonstrated.

The use case of retro-active is then extended to multi-parameter optimisation, to simulate a real-life drug discovery scenario. The compounds are optimised for their structural, physicochemical, and ADMET properties, with a scoring function that combines physics-based and machine learning-based scores. The robustness of the method is demonstrated with both convergent and linear synthesis route topologies and ligands for different target proteins.

The thesis concludes with final remarks regarding retrosynthesis prediction and synthesisable molecule generation with retro-active, including future research directions and challenges in the field.

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Wieczorek, E. (2024). Machine learning for retrosynthesis and synthesisable molecule generation in drug discovery [PhD thesis]. University of Oxford.

MLA Style

Wieczorek, E. Machine Learning for Retrosynthesis and Synthesisable Molecule Generation in Drug Discovery. 2024. University of Oxford, PhD thesis.

Chicago Style

Wieczorek, E. 2024. “Machine Learning for Retrosynthesis and Synthesisable Molecule Generation in Drug Discovery.” PhD thesis, University of Oxford.
Print