Thesis
Developing novel scoring functions for protein-ligand docking using machine learning
- Abstract:
- Structure-based drug discovery uses information about the structure of a protein to identify novel ligands that bind to the protein. The fundamental problem in structure-based drug discovery is predicting if, how, and how strongly a possible ligand binds to a protein. This is often accomplished using scoring functions to rapidly estimate the strength with which a ligand binds to a protein -- its binding affinity. This thesis explores the use of machine learning techniques to improve scoring functions for protein-ligand binding affinity. We first analysed the features used by several published machine learning scoring functions, before showing that augmenting these features with ligand-based features can improve scoring function performance. We then compare the performance of different machine learning algorithms. We next perform a series of experiments to investigate how the size and composition of the training set, and its similarity to the test set, influences the performance of Random Forest scoring functions. We find that regardless of training set composition, augmenting structure-based feature sets with additional ligand-based features leads to enhanced scoring function performance on a diverse test set. We further investigate the predictions of a Random Forest using only ligand-based features, and find that, when a ligand has different binding affinities for multiple binding partners, this ligand-only model is predictive of the mean binding affinity of a ligand for its binding partners. Finally, we address the use of docked poses for the ligand instead of experimentally-determined binding modes. We find that pose prediction errors are common. We show that using docked poses in place of crystallographic binding modes reduces scoring function performance, and that augmenting a structure-based scoring function with ligand-based features can help to counteract this effect. We then construct a new data set and show that generalising to new data and novel targets remains challenging for machine learning scoring functions. In this thesis we examine whether the use of a more detailed representation of the physicochemical properties of a ligand can improve machine learning scoring functions for protein-ligand binding affinity
Actions
Authors
Contributors
+ Morris, G
- Institution:
- University of Oxford
- Department:
- Statistics
- Research group:
- Oxford Protein Informatics Group
- Role:
- Supervisor
- ORCID:
- 0000-0003-1731-8405
+ Deane, C
- Department:
- Statistics
- Research group:
- Oxford Protein Informatics Group
- Role:
- Supervisor
- ORCID:
- 0000-0003-1388-2252
+ Engineering and Physical Sciences Research Council
More from this funder
- Funder identifier:
- http://dx.doi.org/10.13039/501100000266
- Grant:
- EP/G03706X/1
- Programme:
- Systems Biology DTC
- DOI:
- Type of award:
- DPhil
- Level of award:
- Doctoral
- Awarding institution:
- University of Oxford
- Language:
-
English
- Keywords:
- Subjects:
- Pubs id:
-
2042934
- Local pid:
-
pubs:2042934
- Deposit date:
-
2020-07-29
Terms of use
- Copyright holder:
- Boyles, F
- Copyright date:
- 2020
If you are the owner of this record, you can report an update to it here: Report update to this record