Journal article
Quantum mechanical electronic and geometric parameters for DNA k-mers as features for machine learning
- Abstract:
- We are witnessing a steep increase in model development initiatives in genomics that employ high-end machine learning methodologies. Of particular interest are models that predict certain genomic characteristics based solely on DNA sequence. These models, however, treat the DNA as a mere collection of four, A, T, G and C, letters, dismissing the past advancements in science that can enable the use of more intricate information from nucleic acid sequences. Here, we provide a comprehensive database of quantum mechanical (QM) and geometric features for all the permutations of 7-meric DNA in their representative B, A and Z conformations. The database is generated by employing the applicable high-cost and time-consuming QM methodologies. This can thus make it seamless to associate a wealth of novel molecular features to any DNA sequence, by scanning it with a matching k-meric window and pulling the pre-computed values from our database for further use in modelling. We demonstrate the usefulness of our deposited features through their exclusive use in developing a model for A->C mutation rates.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Version of record, pdf, 2.2MB, Terms of use)
-
(Preview, Other, pdf, 7.9MB, Terms of use)
-
- Publisher copy:
- 10.1038/s41597-024-03772-5
Authors
- Publisher:
- Nature Research
- Journal:
- Scientific Data More from this journal
- Volume:
- 11
- Issue:
- 1
- Article number:
- 911
- Publication date:
- 2024-08-22
- Acceptance date:
- 2024-08-13
- DOI:
- EISSN:
-
2052-4463
- Language:
-
English
- Source identifiers:
-
2211370
- Deposit date:
-
2024-08-23
This ORA record was generated from metadata provided by an external service. It has not been edited by the ORA Team.
If you are the owner of this record, you can report an update to it here: Report update to this record