Developing novel scoring functions for protein-ligand docking using machine learning

Boyles, F

Thesis

Developing novel scoring functions for protein-ligand docking using machine learning

Abstract:: Structure-based drug discovery uses information about the structure of a protein to identify novel ligands that bind to the protein. The fundamental problem in structure-based drug discovery is predicting if, how, and how strongly a possible ligand binds to a protein. This is often accomplished using scoring functions to rapidly estimate the strength with which a ligand binds to a protein -- its binding affinity. This thesis explores the use of machine learning techniques to improve scoring functions for protein-ligand binding affinity. We first analysed the features used by several published machine learning scoring functions, before showing that augmenting these features with ligand-based features can improve scoring function performance. We then compare the performance of different machine learning algorithms. We next perform a series of experiments to investigate how the size and composition of the training set, and its similarity to the test set, influences the performance of Random Forest scoring functions. We find that regardless of training set composition, augmenting structure-based feature sets with additional ligand-based features leads to enhanced scoring function performance on a diverse test set. We further investigate the predictions of a Random Forest using only ligand-based features, and find that, when a ligand has different binding affinities for multiple binding partners, this ligand-only model is predictive of the mean binding affinity of a ligand for its binding partners. Finally, we address the use of docked poses for the ligand instead of experimentally-determined binding modes. We find that pose prediction errors are common. We show that using docked poses in place of crystallographic binding modes reduces scoring function performance, and that augmenting a structure-based scoring function with ligand-based features can help to counteract this effect. We then construct a new data set and show that generalising to new data and novel targets remains challenging for machine learning scoring functions. In this thesis we examine whether the use of a more detailed representation of the physicochemical properties of a ligand can improve machine learning scoring functions for protein-ligand binding affinity

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Cite

Cite this record

APA Style

Boyles, F. (2020). Developing novel scoring functions for protein-ligand docking using machine learning [PhD thesis]. University of Oxford.

MLA Style

Boyles, F. Developing Novel Scoring Functions for Protein-Ligand Docking Using Machine Learning. University of Oxford, 2020.

Chicago Style

Boyles, F. 2020. “Developing Novel Scoring Functions for Protein-Ligand Docking Using Machine Learning.” PhD thesis, University of Oxford.
Share
Print

Access Document

Files:: Thesis.pdf

(Preview, Dissemination version, pdf, 23.6MB, Terms of use)

Authors

+ Boyles, F More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Statistics
Research group:: Oxford Protein Informatics Group
Oxford college:: Brasenose College
Role:: Author
ORCID:: 0000-0002-4185-1229

Contributors

+ Morris, G

Institution:: University of Oxford
Department:: Statistics
Research group:: Oxford Protein Informatics Group
Role:: Supervisor
ORCID:: 0000-0003-1731-8405

+ Deane, C

Department:: Statistics
Research group:: Oxford Protein Informatics Group
Role:: Supervisor
ORCID:: 0000-0003-1388-2252

+ Engineering and Physical Sciences Research Council More from this funder

Funder identifier:: http://dx.doi.org/10.13039/501100000266
Grant:: EP/G03706X/1
Programme:: Systems Biology DTC

DOI:: 10.5287/ora-oz5x7yjdb
Type of award:: DPhil
Level of award:: Doctoral
Awarding institution:: University of Oxford

Language:: English
Keywords:: virtual screening

machine learning

docking

drug discovery

scoring function
Subjects:: Computer Aided Drug Design
Pubs id:: 2042934
Local pid:: pubs:2042934
Deposit date:: 2020-07-29

Terms of use

Copyright holder:: Boyles, F

Licence:: Terms and Conditions of Use for Oxford University Research Archive

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Thesis

Developing novel scoring functions for protein-ligand docking using machine learning

Actions

Access Document

Authors

Contributors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Thesis

Developing novel scoring functions for protein-ligand docking using machine learning

Actions

Access Document

Authors

Contributors

Funding

Bibliographic Details

Item Description

Related Items

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions