Internet publication
Exploring QSAR models for activity-cliff prediction
- Abstract:
-
Introduction and Methodology: Pairs of similar compounds that only differ by a small structural modification but exhibit a large difference in their binding affinity for a given target are known as activity cliffs (ACs). It has been hypothesised that QSAR models struggle to predict ACs and that ACs thus form a major source of prediction error. However, a study to explore the AC-prediction power of modern QSAR methods and its relationship to general QSAR prediction performance is lacking. We systematically construct nine distinct QSAR models by combining three molecular representation methods (extended-connectivity fingerprints, physicochemical-descriptor vectors and graph isomorphism networks) with three regression techniques (random forests, k-nearest neighbours and multilayer perceptrons); we then use each resulting model to classify pairs of similar compounds as ACs or non-ACs and to predict the activities of individual molecules in three case studies: dopamine receptor D2, factor Xa, and SARS-CoV-2 main protease.
Results and Conclusions: We observe low AC-sensitivity amongst the tested models when the activities of both compounds are unknown, but a substantial increase in AC-sensitivity when the actual activity of one of the compounds is given. Graph isomorphism features are found to be competitive with or superior to classical molecular representations for AC-classification and can thus be employed as baseline AC-prediction models or simple compound optimisation tools. For general QSAR-prediction, however, extended connectivity fingerprints still consistently deliver the best performance. Our results provide strong support for the hypothesis that indeed QSAR methods frequently fail to predict ACs. We propose twin-network training for deep learning models as a potential future pathway to increase AC-sensitivity and thus overall QSAR performance.
- Publication status:
- Accepted
- Peer review status:
- Reviewed (other)
Actions
Access Document
- Files:
-
-
(Preview, Pre-print, pdf, 2.6MB, Terms of use)
-
- Publisher copy:
- 10.48550/arXiv.2301.13644
Authors
- Publisher:
- Cornell University
- Publication date:
- 2023-01-31
- DOI:
- Language:
-
English
- Keywords:
- Pubs id:
-
1326502
- Local pid:
-
pubs:1326502
- Deposit date:
-
2023-03-10
- ARK identifier:
Terms of use
- Copyright holder:
- Dablander et al.
- Copyright date:
- 2023
- Rights statement:
- © 2023 The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0).
- Notes:
- This is the pre-print version of the article. The final version is available online from BioMed Central at: https://doi.org/10.1186/s13321-023-00708-w.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record