Journal article icon

Journal article

araCNA: somatic copy number profiling using long-range sequence models

Abstract:
Somatic copy number alterations (CNAs) are hallmarks of cancer. Current algorithms that call CNAs from whole-genome sequenced (WGS) data have not exploited deep learning methods owing to computational scaling limitations. Here, we present a novel deep-learning approach, araCNA, trained only on simulated data that can accurately predict CNAs in real WGS cancer genomes. araCNA uses novel transformer alternatives (e.g. Mamba) to handle genomic-scale sequence lengths (∼1M) and learn long-range interactions. Results are extremely accurate on simulated data, and this zero-shot approach is on par with existing methods when applied to 50 WGS samples from the Cancer Genome Atlas. Notably, our approach requires only a tumour sample and not a matched normal sample, has fewer markers of overfitting, and performs inference in only a few minutes. araCNA demonstrates how domain knowledge can be used to simulate training sets that harness the power of modern machine learning in biological applications.
Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Authors

More by this author
Institution:
University of Oxford
Role:
Author
More by this author
Institution:
University of Oxford
Role:
Author
ORCID:
0000-0001-7615-8523


More from this funder
Funder identifier:
https://ror.org/001aqnf71


Publisher:
Oxford University Press
Journal:
NAR Genomics and Bioinformatics More from this journal
Volume:
7
Issue:
3
Article number:
lqaf124
Publication date:
2025-09-09
Acceptance date:
2025-08-14
DOI:
EISSN:
2631-9268
ISSN:
2631-9268


Language:
English
Source identifiers:
3270158
Deposit date:
2025-09-09
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP