Journal article icon

Journal article

An open-source multi-semantic annotation dataset and automated recognition tool for viral carcinogenesis factors

Abstract:
In-depth investigations into the characteristics of high-risk oncogenic viruses are critical for the early prevention and control of related cancers and the development of effective vaccines. The mechanism of viral carcinogenesis involves numerous risk factors such as viral genomic variations, lifestyle, and environmental influences. Based on literature data on eight oncogenic viruses, we have created a large-scale, semantically rich corpus of viral carcinogenic factors, including 551 715 abstracts and 5 821 308 entities, using natural language processing technology combined with expert knowledge. We also developed a semantic filter to improve entity recognition performance. Moreover, transcriptomic data related to oncogenic viruses were collected. We performed gene differential expression analysis, feature gene identification, and immune microenvironment analysis. A visual knowledge platform, an open-source dataset, and a tool for automatically identifying internal and external semantic factors related to viral carcinogenesis are available at http://www.biomedinfo.cn:8281/. This study provides new insights into the key factors involved in the viral carcinogenesis process and helps researchers and clinicians quickly obtain clues for further experimental research and clinical validation.
Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Publisher copy:
10.1093/database/baaf038

Authors


More by this author
Role:
Author
ORCID:
0009-0003-2739-5348
More by this author
Institution:
University of Oxford
Oxford college:
Wolfson College
Role:
Author


More from this funder
Funder identifier:
https://ror.org/01h0zpd94


Publisher:
Oxford University Press
Journal:
Database: The Journal of Biological Databases and Curation More from this journal
Volume:
2025
Article number:
baaf038
Publication date:
2025-09-24
Acceptance date:
2025-05-14
DOI:
EISSN:
1758-0463
ISSN:
1758-0463


Language:
English
Source identifiers:
3314557
Deposit date:
2025-09-25
This ORA record was generated from metadata provided by an external service. It has not been edited by the ORA Team.

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP