Journal article
An open-source multi-semantic annotation dataset and automated recognition tool for viral carcinogenesis factors
- Abstract:
- In-depth investigations into the characteristics of high-risk oncogenic viruses are critical for the early prevention and control of related cancers and the development of effective vaccines. The mechanism of viral carcinogenesis involves numerous risk factors such as viral genomic variations, lifestyle, and environmental influences. Based on literature data on eight oncogenic viruses, we have created a large-scale, semantically rich corpus of viral carcinogenic factors, including 551 715 abstracts and 5 821 308 entities, using natural language processing technology combined with expert knowledge. We also developed a semantic filter to improve entity recognition performance. Moreover, transcriptomic data related to oncogenic viruses were collected. We performed gene differential expression analysis, feature gene identification, and immune microenvironment analysis. A visual knowledge platform, an open-source dataset, and a tool for automatically identifying internal and external semantic factors related to viral carcinogenesis are available at http://www.biomedinfo.cn:8281/. This study provides new insights into the key factors involved in the viral carcinogenesis process and helps researchers and clinicians quickly obtain clues for further experimental research and clinical validation.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Version of record, pdf, 2.0MB, Terms of use)
-
- Publisher copy:
- 10.1093/database/baaf038
Authors
+ National Natural Science Foundation of China
More from this funder
- Funder identifier:
- https://ror.org/01h0zpd94
- Publisher:
- Oxford University Press
- Journal:
- Database: The Journal of Biological Databases and Curation More from this journal
- Volume:
- 2025
- Article number:
- baaf038
- Publication date:
- 2025-09-24
- Acceptance date:
- 2025-05-14
- DOI:
- EISSN:
-
1758-0463
- ISSN:
-
1758-0463
- Language:
-
English
- Source identifiers:
-
3314557
- Deposit date:
-
2025-09-25
This ORA record was generated from metadata provided by an external service. It has not been edited by the ORA Team.
If you are the owner of this record, you can report an update to it here: Report update to this record