Journal article icon

Journal article

Pango lineage designation and assignment using SARS-CoV-2 spike gene nucleotide sequences

Abstract:
BACKGROUND: More than 2 million SARS-CoV-2 genome sequences have been generated and shared since the start of the COVID-19 pandemic and constitute a vital information source that informs outbreak control, disease surveillance, and public health policy. The Pango dynamic nomenclature is a popular system for classifying and naming genetically-distinct lineages of SARS-CoV-2, including variants of concern, and is based on the analysis of complete or near-complete virus genomes. However, for several reasons, nucleotide sequences may be generated that cover only the spike gene of SARS-CoV-2. It is therefore important to understand how much information about Pango lineage status is contained in spike-only nucleotide sequences. Here we explore how Pango lineages might be reliably designated and assigned to spike-only nucleotide sequences. We survey the genetic diversity of such sequences, and investigate the information they contain about Pango lineage status. RESULTS: Although many lineages, including the main variants of concern, can be identified clearly using spike-only sequences, some spike-only sequences are shared among tens or hundreds of Pango lineages. To facilitate the classification of SARS-CoV-2 lineages using subgenomic sequences we introduce the notion of designating such sequences to a “lineage set”, which represents the range of Pango lineages that are consistent with the observed mutations in a given spike sequence. CONCLUSIONS: We find that many lineages, including the main variants-of-concern, can be reliably identified by spike alone and we define lineage-sets to represent the lineage precision that can be achieved using spike-only nucleotide sequences. These data provide a foundation for the development of software tools that can assign newly-generated spike nucleotide sequences to Pango lineage sets. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-022-08358-2
Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Authors

More by this author
Role:
Author
ORCID:
0000-0001-8083-474X
More by this author
Institution:
University of Oxford
Role:
Author
ORCID:
0000-0002-8797-2667
More by this author
Role:
Author
ORCID:
0000-0002-6505-7281
More by this author
Role:
Author
ORCID:
0000-0002-6988-8576
More by this author
Role:
Author
ORCID:
0000-0003-4337-3707


More from this funder
Funder identifier:
10.13039/100016127
Grant:
2236
More from this funder
Funder identifier:
10.13039/100004440
Grant:
grant.203783/Z/16/Z


Publisher:
BioMed Central
Journal:
BMC Genomics More from this journal
Volume:
23
Issue:
1
Pages:
121-121
Article number:
121
Publication date:
2022-02-11
DOI:
EISSN:
1471-2164
ISSN:
1471-2164


Language:
English
Keywords:
Pubs id:
1241334
Local pid:
pubs:1241334
Source identifiers:
W4225851399
Deposit date:
2026-04-09
ARK identifier:
This ORA record was generated from metadata provided by an external service. It has not been edited by the ORA Team.

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP