Journal article
Self-supervised monocular depth and pose estimation for endoscopy with latent priors
- Abstract:
- Accurate 3D reconstruction in endoscopy enables quantitative and holistic lesion characterization within the gastrointestinal (GI) tract. To achieve this, reliable depth and pose estimation is required. However, endoscopy systems are monocular, and existing methods relying on synthetic datasets or complex models often lack generalizability in challenging endoscopic conditions. We propose a robust self-supervised monocular depth and pose estimation framework that incorporates a StyleGAN-based generator and a Variational Autoencoder (VAE). The StyleGAN generator leverages extensive depth scenes from natural images to condition the depth network, enhancing realism and robustness of depth predictions through latent feature priors. For pose estimation, we reformulate it within a VAE framework, treating pose transitions as latent variables to regularize scale, stabilize z-axis prominence, and improve x-y sensitivity. To further enhance pose stability and generalizability, we introduce a prior transfer module that distills motion knowledge from natural scene SLAM systems. Specifically, pose priors from a pretrained SLAM model-supervised on large-scale natural scene datasets-are used to guide the latent distribution of pose through a KL-divergence reparameterization. This mechanism effectively transfers structural motion priors into the endoscopic domain, improving trajectory consistency under challenging conditions. This dual refinement pipeline enables accurate depth and pose predictions, effectively addressing the GI tract's complex textures and lighting. Extensive evaluations on SimCol, C3VD, and EndoSLAM datasets confirm our framework's superior performance over published self-supervised methods in endoscopic depth and pose estimation. All data descriptions and code are available at https://github.com/EricXuziang/ Self-supervised-with-Latent-Priors.git.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 2.1MB, Terms of use)
-
- Publisher copy:
- 10.1109/tmi.2026.3671423
Authors
+ Academy of Medical Sciences
More from this funder
- Funder identifier:
- https://ror.org/00c489v88
- Grant:
- SBF00101191
+ Engineering and Physical Sciences Research Council
More from this funder
- Funder identifier:
- 10.13039/501100000266
- Grant:
- UKRI914
+ NIHR Oxford Biomedical Research Centre
More from this funder
- Funder identifier:
- 10.13039/501100013373
- Publisher:
- IEEE
- Journal:
- IEEE Transactions on Medical Imaging More from this journal
- Place of publication:
- United States
- Publication date:
- 2026-03-09
- DOI:
- EISSN:
-
1558-254X
- ISSN:
-
0278-0062
- Pmid:
-
41801778
- Language:
-
English
- Keywords:
- Pubs id:
-
2392771
- Local pid:
-
pubs:2392771
- Source identifiers:
-
W7134846593
- Deposit date:
-
2026-04-29
- ARK identifier:
Terms of use
- Copyright holder:
- IEEE
- Copyright date:
- 2026
- Rights statement:
- © 2026 IEEE. All rights reserved, including rights for text and data mining and training of artificial intelligence and similar technologies
- Notes:
- The author accepted manuscript (AAM) of this paper has been made available under the University of Oxford's Open Access Publications Policy, and a CC BY public copyright licence has been applied.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record