Thesis icon

Thesis

Machine learning for retrosynthesis and synthesisable molecule generation in drug discovery

Abstract:
Drug discovery is a notoriously difficult and slow process, with high research and development costs and a decreasing success rate. Computer-Aided Drug Design methods show promise in improving the efficiency of early stage drug discovery, increasing the number of compounds that can be evaluated per design cycle and allowing for pre-filtering of molecules with fast computational methods before they are synthesised. However, many of the compounds designed in silico are not synthesisable in practice or the synthesis routes towards them are not obvious. This leads to computational resources being wasted on designing molecules that can never be tested experimentally. This thesis explores new methods for two approaches assessing and improving synthesisability in drug discovery: retrosynthesis prediction and synthesisability-constrained molecule generation.

First, the problem of retrosynthesis prediction for molecules containing heterocyclic scaffolds is considered. Four domain adaptation approaches are benchmarked to develop a single-step retrosynthesis prediction model with improved performance for ring disconnections. Accuracy for heterocycle formations and all reaction classes, as well as computational cost, are considered. A further fine-tuning workflow for continual retraining of the model with newly published data is introduced. The application of the most versatile model, trained with a mixed fine-tuning strategy, is then demonstrated in multi-step retrosynthesis in a retrospective analysis for two drug-like compounds.

Next, the development of retro-active, a method for synthesisable molecule generation and optimisation, is described. Retro-active generates molecules based on a known synthesis route and a provided starting material pool. The use of active learning for starting material selection allows for the optimisation of the resulting product molecules for user-defined scoring functions. A benchmark of starting material acquisition and product enumeration methods is included, as well as a comparison to alternative non-machine learning-based starting material selection approaches. The applicability of retro-active for both ligand-based and structure-based drug discovery is demonstrated.

The use case of retro-active is then extended to multi-parameter optimisation, to simulate a real-life drug discovery scenario. The compounds are optimised for their structural, physicochemical, and ADMET properties, with a scoring function that combines physics-based and machine learning-based scores. The robustness of the method is demonstrated with both convergent and linear synthesis route topologies and ligands for different target proteins.

The thesis concludes with final remarks regarding retrosynthesis prediction and synthesisable molecule generation with retro-active, including future research directions and challenges in the field.

Actions

Access Document

Files:

Authors

More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Chemistry
Sub department:
Organic Chemistry
Role:
Author

Contributors

Institution:
University of Oxford
Division:
MPLS
Department:
Chemistry
Sub department:
Organic Chemistry
Role:
Supervisor
ORCID:
0000-0002-6062-8209
Institution:
University of Oxford
Division:
MSD
Department:
NDM
Role:
Supervisor


More from this funder
Funder identifier:
https://ror.org/0439y7842
Grant:
EP/S024093/1


DOI:
Type of award:
DPhil
Level of award:
Doctoral
Awarding institution:
University of Oxford

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP