Journal article icon

Journal article

Identifying genes associated with invasive disease in S. pneumoniae by applying a machine learning approach to whole genome sequence typing data

Abstract:
Streptococcus pneumoniae, a normal commensal of the upper respiratory tract, is a major public health concern, responsible for substantial global morbidity and mortality due to pneumonia, meningitis and sepsis. Why some pneumococci invade the bloodstream or CSF (so-called invasive pneumococcal disease; IPD) is uncertain. In this study we identify genes associated with IPD. We transform whole genome sequence (WGS) data into a sequence typing scheme, while avoiding the caveat of using an arbitrary genome as a reference by substituting it with a constructed pangenome. We then employ a random forest machine-learning algorithm on the transformed data, and find 43 genes consistently associated with IPD across three geographically distinct WGS data sets of pneumococcal carriage isolates. Of the genes we identified as associated with IPD, we find 23 genes previously shown to be directly relevant to IPD, as well as 18 uncharacterized genes. We suggest that these uncharacterized genes identified by us are also likely to be relevant for IPD.
Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Publisher copy:
10.1038/s41598-019-40346-7

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Zoology
Oxford college:
Linacre College
Role:
Author


Publisher:
Nature Research
Journal:
Scientific Reports More from this journal
Volume:
9
Issue:
2019
Article number:
4049
Publication date:
2019-03-11
Acceptance date:
2019-02-04
DOI:
EISSN:
2045-2322


Keywords:
Pubs id:
pubs:972965
UUID:
uuid:ddf77e80-dd72-46c6-9a58-aa8e8b7791bd
Local pid:
pubs:972965
Source identifiers:
972965
Deposit date:
2019-02-14

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP