Internet publication icon

Internet publication

Exploring QSAR models for activity-cliff prediction

Abstract:

Introduction and Methodology: Pairs of similar compounds that only differ by a small structural modification but exhibit a large difference in their binding affinity for a given target are known as activity cliffs (ACs). It has been hypothesised that QSAR models struggle to predict ACs and that ACs thus form a major source of prediction error. However, a study to explore the AC-prediction power of modern QSAR methods and its relationship to general QSAR prediction performance is lacking. We systematically construct nine distinct QSAR models by combining three molecular representation methods (extended-connectivity fingerprints, physicochemical-descriptor vectors and graph isomorphism networks) with three regression techniques (random forests, k-nearest neighbours and multilayer perceptrons); we then use each resulting model to classify pairs of similar compounds as ACs or non-ACs and to predict the activities of individual molecules in three case studies: dopamine receptor D2, factor Xa, and SARS-CoV-2 main protease.

Results and Conclusions: We observe low AC-sensitivity amongst the tested models when the activities of both compounds are unknown, but a substantial increase in AC-sensitivity when the actual activity of one of the compounds is given. Graph isomorphism features are found to be competitive with or superior to classical molecular representations for AC-classification and can thus be employed as baseline AC-prediction models or simple compound optimisation tools. For general QSAR-prediction, however, extended connectivity fingerprints still consistently deliver the best performance. Our results provide strong support for the hypothesis that indeed QSAR methods frequently fail to predict ACs. We propose twin-network training for deep learning models as a potential future pathway to increase AC-sensitivity and thus overall QSAR performance.

Publication status:
Accepted
Peer review status:
Reviewed (other)

Actions

Access Document

Publisher copy:
10.48550/arXiv.2301.13644

Authors

More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Mathematical Institute
Oxford college:
Mansfield College
Role:
Author
ORCID:
0000-0001-6218-8860
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Mathematical Institute
Oxford college:
Somerville College
Role:
Author
ORCID:
0000-0002-0583-4595
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Statistics
Oxford college:
Green Templeton College
Role:
Author
ORCID:
0000-0003-1731-8405


Publisher:
Cornell University
Publication date:
2023-01-31
DOI:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP