Journal article icon

Journal article

Unsupervised identification of significant lineages of SARS-CoV-2 through scalable machine learning methods

Abstract:
Since its emergence in late 2019, SARS-CoV-2 has diversified into a large number of lineages and caused multiple waves of infection globally. Novel lineages have the potential to spread rapidly and internationally if they have higher intrinsic transmissibility and/or can evade host immune responses, as has been seen with the Alpha, Delta, and Omicron variants of concern. They can also cause increased mortality and morbidity if they have increased virulence, as was seen for Alpha and Delta. Phylogenetic methods provide the "gold standard" for representing the global diversity of SARS-CoV-2 and to identify newly emerging lineages. However, these methods are computationally expensive, struggle when datasets get too large, and require manual curation to designate new lineages. These challenges provide a motivation to develop complementary methods that can incorporate all of the genetic data available without down-sampling to extract meaningful information rapidly and with minimal curation. In this paper, we demonstrate the utility of using algorithmic approaches based on word-statistics to represent whole sequences, bringing speed, scalability, and interpretability to the construction of genetic topologies. While not serving as a substitute for current phylogenetic analyses, the proposed methods can be used as a complementary, and fully automatable, approach to identify and confirm new emerging variants.
Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Publisher copy:
10.1073/pnas.2317284121

Authors

More by this author
Role:
Author
ORCID:
0000-0002-0212-6825
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Biology
Oxford college:
Brasenose College
Role:
Author
ORCID:
0000-0002-7089-7680
More by this author
Role:
Author
ORCID:
0000-0002-3033-2335
More by this author
Role:
Author
ORCID:
0000-0002-3436-6487
More by this author
Role:
Author
ORCID:
0000-0001-5835-8062


More from this funder
Funder identifier:
https://ror.org/03wnrjx87
Grant:
INF\R2\180067


Publisher:
National Academy of Sciences
Journal:
Proceedings of the National Academy of Sciences More from this journal
Volume:
121
Issue:
12
Article number:
e2317284121
Place of publication:
United States
Publication date:
2024-03-13
Acceptance date:
2024-02-05
DOI:
EISSN:
1091-6490
ISSN:
0027-8424
Pmid:
38478692


Language:
English
Keywords:
Pubs id:
1806057
Local pid:
pubs:1806057
Deposit date:
2024-03-18
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP