Thesis icon

Thesis

Developing novel scoring functions for protein-ligand docking using machine learning

Abstract:
Structure-based drug discovery uses information about the structure of a protein to identify novel ligands that bind to the protein. The fundamental problem in structure-based drug discovery is predicting if, how, and how strongly a possible ligand binds to a protein. This is often accomplished using scoring functions to rapidly estimate the strength with which a ligand binds to a protein -- its binding affinity. This thesis explores the use of machine learning techniques to improve scoring functions for protein-ligand binding affinity. We first analysed the features used by several published machine learning scoring functions, before showing that augmenting these features with ligand-based features can improve scoring function performance. We then compare the performance of different machine learning algorithms. We next perform a series of experiments to investigate how the size and composition of the training set, and its similarity to the test set, influences the performance of Random Forest scoring functions. We find that regardless of training set composition, augmenting structure-based feature sets with additional ligand-based features leads to enhanced scoring function performance on a diverse test set. We further investigate the predictions of a Random Forest using only ligand-based features, and find that, when a ligand has different binding affinities for multiple binding partners, this ligand-only model is predictive of the mean binding affinity of a ligand for its binding partners. Finally, we address the use of docked poses for the ligand instead of experimentally-determined binding modes. We find that pose prediction errors are common. We show that using docked poses in place of crystallographic binding modes reduces scoring function performance, and that augmenting a structure-based scoring function with ligand-based features can help to counteract this effect. We then construct a new data set and show that generalising to new data and novel targets remains challenging for machine learning scoring functions. In this thesis we examine whether the use of a more detailed representation of the physicochemical properties of a ligand can improve machine learning scoring functions for protein-ligand binding affinity

Actions


Access Document


Files:

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Statistics
Research group:
Oxford Protein Informatics Group
Oxford college:
Brasenose College
Role:
Author
ORCID:
0000-0002-4185-1229

Contributors

Institution:
University of Oxford
Department:
Statistics
Research group:
Oxford Protein Informatics Group
Role:
Supervisor
ORCID:
0000-0003-1731-8405
Department:
Statistics
Research group:
Oxford Protein Informatics Group
Role:
Supervisor
ORCID:
0000-0003-1388-2252


More from this funder
Funder identifier:
http://dx.doi.org/10.13039/501100000266
Grant:
EP/G03706X/1
Programme:
Systems Biology DTC


DOI:
Type of award:
DPhil
Level of award:
Doctoral
Awarding institution:
University of Oxford


Language:
English
Keywords:
Subjects:
Pubs id:
2042934
Local pid:
pubs:2042934
Deposit date:
2020-07-29

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP