Journal article icon

Journal article

BreakAlign: a Perl program to align chimaeric (split) genomic NGS reads and allow visual confirmation of novel retroviral integrations

Abstract:
Background: Retroviruses replicate by integrating a DNA copy into a host chromosome. Detecting novel retroviral integrations (ones not in the reference genome sequence of the host) from genomic NGS data is bioinformatically challenging and frequently produces many false positives. One common method of confirmation is visual inspection of an alignment of the chimaeric (split) reads that span a putative novel retroviral integration site. We perceived the need for a program that would facilitate this by producing a multiple alignment containing both the viral and host regions that flank an integration. Results: BreakAlign is a Perl program that uses blastn to produce such a multiple alignment. In addition to the NGS dataset and a reference viral sequence, the program requires either (a) the ~ 500nt host genome sequence that spans the putative integration or (b) coordinates of this putative integration in an installed copy of the reference human genome (multiple integrations can be processed automatically). BreakAlign is freely available from https://github.com/marchiem/breakalign and is accompanied by example files allowing a test run. Conclusion: BreakAlign will confirm and facilitate characterisation of both (a) germline integrations of endogenous retroviruses and (b) somatic integrations of exogenous retroviruses such as HIV and HTLV. Although developed for use with genomic short-read NGS (second generation) data and retroviruses, it should also be useful for long-read (third generation) data and any mobile element with at least one conserved flanking region
Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Files:
Publisher copy:
10.1186/s12859-022-04621-1

Authors

More by this author
Institution:
University of Oxford
Role:
Author
ORCID:
0000-0003-1741-9353
More by this author
Institution:
University of Oxford
Role:
Author
ORCID:
0000-0002-9731-7623
More by this author
Institution:
University of Oxford
Role:
Author
ORCID:
0000-0003-4307-9161
More by this author
Institution:
University of Oxford
Role:
Author
ORCID:
0000-0001-7163-7277
More by this author
Role:
Author
ORCID:
0000-0002-0141-4753


More from this funder
Funder identifier:
10.13039/100004440
Grant:
WT109965MA
WT086173


Publisher:
BioMed Central
Journal:
BMC Bioinformatics More from this journal
Volume:
23
Issue:
1
Pages:
134-134
Article number:
134
Publication date:
2022-04-15
DOI:
EISSN:
1471-2105
ISSN:
1471-2105


Language:
English
Keywords:
Pubs id:
1254711
Local pid:
pubs:1254711
Source identifiers:
W4223993063
Deposit date:
2026-04-23
ARK identifier:
This ORA record was generated from metadata provided by an external service. It has not been edited by the ORA Team.

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP