Journal article
Semi-supervised Bayesian integration of multiple spatial proteomics datasets
- Abstract:
- The subcellular localisation of proteins is a key determinant of their function. High-throughput analyses of these localisations can be performed using mass spectrometry-based spatial proteomics, which enables us to examine the localisation and relocalisation of proteins. Furthermore, complementary data sources can provide additional sources of functional or localisation information. Examples include protein annotations and other high-throughput ‘omic assays. Integrating these modalities can provide new insights as well as additional confidence in results, but existing approaches for integrative analyses of spatial proteomics datasets, such as concatenation-based methods and transfer learning approaches like KNN-TL, are limited in the types of data they can integrate and do not quantify uncertainty in their predictions. Here we propose a semi-supervised Bayesian approach (wherein model parameters are inferred from both labeled marker proteins and unlabeled data while quantifying prediction uncertainty) to integrate spatial proteomics datasets with other data sources, to improve the inference of protein sub-cellular localisation. We demonstrate our approach outperforms other transfer-learning methods and has greater flexibility in the data it can model - including categorical annotations (e.g., Gene Ontology terms), continuous measurements (e.g., protein abundance), and temporal profiles (e.g., time-series expression data). To demonstrate the flexibility of our approach, we apply our method to integrate spatial proteomics data generated for the parasite Toxoplasma gondii with time-series gene expression data generated over its cell cycle. Our findings suggest that proteins linked to invasion organelles are associated with expression programs that peak at the end of the first cell-cycle. Furthermore, this integrative analysis divides the dense granule proteins into heterogeneous populations suggestive of potentially different functions. Our method is disseminated via the mdir R package available on the lead author’s Github.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Version of record, pdf, 6.0MB, Terms of use)
-
(Supplementary materials, zip, 2.9MB, Terms of use)
-
- Publisher copy:
- 10.1371/journal.pcbi.1013799
Authors
- Publisher:
- Public Library of Science
- Journal:
- PLoS Computational Biology More from this journal
- Volume:
- 21
- Issue:
- 12
- Pages:
- e1013799-e1013799
- Article number:
- e1013799
- Publication date:
- 2025-12-15
- Acceptance date:
- 2025-11-30
- DOI:
- EISSN:
-
1553-7358
- ISSN:
-
1553734X, 1553-734X
- Language:
-
English
- UUID:
-
uuid_b5d3889d-5f3c-4644-8821-ccec3f4ace9a
- Source identifiers:
-
3586986
- Deposit date:
-
2025-12-22
- ARK identifier:
This ORA record was generated from metadata provided by an external service. It has not been edited by the ORA Team.
Terms of use
- Copyright date:
- 2025
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record