Journal article
Pango lineage designation and assignment using SARS-CoV-2 spike gene nucleotide sequences
- Abstract:
- BACKGROUND: More than 2 million SARS-CoV-2 genome sequences have been generated and shared since the start of the COVID-19 pandemic and constitute a vital information source that informs outbreak control, disease surveillance, and public health policy. The Pango dynamic nomenclature is a popular system for classifying and naming genetically-distinct lineages of SARS-CoV-2, including variants of concern, and is based on the analysis of complete or near-complete virus genomes. However, for several reasons, nucleotide sequences may be generated that cover only the spike gene of SARS-CoV-2. It is therefore important to understand how much information about Pango lineage status is contained in spike-only nucleotide sequences. Here we explore how Pango lineages might be reliably designated and assigned to spike-only nucleotide sequences. We survey the genetic diversity of such sequences, and investigate the information they contain about Pango lineage status. RESULTS: Although many lineages, including the main variants of concern, can be identified clearly using spike-only sequences, some spike-only sequences are shared among tens or hundreds of Pango lineages. To facilitate the classification of SARS-CoV-2 lineages using subgenomic sequences we introduce the notion of designating such sequences to a “lineage set”, which represents the range of Pango lineages that are consistent with the observed mutations in a given spike sequence. CONCLUSIONS: We find that many lineages, including the main variants-of-concern, can be reliably identified by spike alone and we define lineage-sets to represent the lineage precision that can be achieved using spike-only nucleotide sequences. These data provide a foundation for the development of software tools that can assign newly-generated spike nucleotide sequences to Pango lineage sets. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-022-08358-2
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Version of record, pdf, 3.6MB, Terms of use)
-
- Publisher copy:
- 10.1186/s12864-022-08358-2
- Publication website:
- https://www.research.ed.ac.uk/files/255298512/s12864_022_08358_2.pdf
Authors
+ Wellcome Trust
More from this funder
- Funder identifier:
- 10.13039/100004440
- Grant:
- grant.203783/Z/16/Z
+ Oxford Martin School, University of Oxford
More from this funder
- Funder identifier:
- 10.13039/501100004211
- Publisher:
- BioMed Central
- Journal:
- BMC Genomics More from this journal
- Volume:
- 23
- Issue:
- 1
- Pages:
- 121-121
- Article number:
- 121
- Publication date:
- 2022-02-11
- DOI:
- EISSN:
-
1471-2164
- ISSN:
-
1471-2164
- Language:
-
English
- Keywords:
- Pubs id:
-
1241334
- Local pid:
-
pubs:1241334
- Source identifiers:
-
W4225851399
- Deposit date:
-
2026-04-09
- ARK identifier:
This ORA record was generated from metadata provided by an external service. It has not been edited by the ORA Team.
Terms of use
- Copyright date:
- 2022
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record