Self-supervised monocular depth and pose estimation for endoscopy with latent priors

Xu, Z; Li, B; Hu, Y; Zhang, C; East, J; Ali, S; Rittscher, J

AI Collection

Journal article

Self-supervised monocular depth and pose estimation for endoscopy with latent priors

Abstract:: Accurate 3D reconstruction in endoscopy enables quantitative and holistic lesion characterization within the gastrointestinal (GI) tract. To achieve this, reliable depth and pose estimation is required. However, endoscopy systems are monocular, and existing methods relying on synthetic datasets or complex models often lack generalizability in challenging endoscopic conditions. We propose a robust self-supervised monocular depth and pose estimation framework that incorporates a StyleGAN-based generator and a Variational Autoencoder (VAE). The StyleGAN generator leverages extensive depth scenes from natural images to condition the depth network, enhancing realism and robustness of depth predictions through latent feature priors. For pose estimation, we reformulate it within a VAE framework, treating pose transitions as latent variables to regularize scale, stabilize z-axis prominence, and improve x-y sensitivity. To further enhance pose stability and generalizability, we introduce a prior transfer module that distills motion knowledge from natural scene SLAM systems. Specifically, pose priors from a pretrained SLAM model-supervised on large-scale natural scene datasets-are used to guide the latent distribution of pose through a KL-divergence reparameterization. This mechanism effectively transfers structural motion priors into the endoscopic domain, improving trajectory consistency under challenging conditions. This dual refinement pipeline enables accurate depth and pose predictions, effectively addressing the GI tract's complex textures and lighting. Extensive evaluations on SimCol, C3VD, and EndoSLAM datasets confirm our framework's superior performance over published self-supervised methods in endoscopic depth and pose estimation. All data descriptions and code are available at https://github.com/EricXuziang/ Self-supervised-with-Latent-Priors.git.

Publication status:: Published

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Xu, Z., Li, B., Hu, Y., Zhang, C., East, J., Ali, S., & Rittscher, J. (2026). Self-supervised monocular depth and pose estimation for endoscopy with latent priors. IEEE Transactions on Medical Imaging.

MLA Style

Xu, Z, et al. “Self-Supervised Monocular Depth and Pose Estimation for Endoscopy with Latent Priors.” IEEE Transactions on Medical Imaging, 2026.

Chicago Style

Xu, Z, B Li, Y Hu, et al. 2026. “Self-Supervised Monocular Depth and Pose Estimation for Endoscopy with Latent Priors.” IEEE Transactions on Medical Imaging.
Print

Access Document

Files:: Xu_et_al_2026_Self-supervised_monocular_depth.pdf

(Preview, Accepted manuscript, pdf, 2.1MB, Terms of use)

Publisher copy:: 10.1109/tmi.2026.3671423

Authors

+ Xu, Z More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Engineering Science
Role:: Author
ORCID:: 0000-0002-3883-3716

+ Li, B More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Engineering Science
Role:: Author

+ Hu, Y More by this author

Role:: Author

+ Zhang, C More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Engineering Science
Role:: Author

+ East, J More by this author

Institution:: University of Oxford
Division:: MSD
Department:: NDM
Role:: Author
ORCID:: 0000-0001-8035-3700

More authors...

+ Academy of Medical Sciences More from this funder

Funder identifier:: https://ror.org/00c489v88
Grant:: SBF00101191

+ Engineering and Physical Sciences Research Council More from this funder

Funder identifier:: 10.13039/501100000266
Grant:: UKRI914

+ Ludwig Institute for Cancer Research More from this funder

Funder identifier:: 10.13039/100009729

+ NIHR Oxford Biomedical Research Centre More from this funder

Funder identifier:: 10.13039/501100013373

Publisher:: IEEE
Journal:: IEEE Transactions on Medical Imaging More from this journal
Place of publication:: United States
Publication date:: 2026-03-09
DOI:: 10.1109/tmi.2026.3671423
EISSN:: 1558-254X
ISSN:: 0278-0062
Pmid:: 41801778

Language:: English
Keywords:: cameras

self-supervised learning

deep learning

endoscopy

monocular depth and pose estimation

endoscopes

pose estimation

depth measurement

three-dimensional displays

generators

accuracy

lesions

image reconstruction

colonoscopy
Pubs id:: 2392771
Local pid:: pubs:2392771
Source identifiers:: W7134846593
Deposit date:: 2026-04-29
ARK identifier:: ark:/29072/ora_e98341a7a45e40d18e86397d94cf3db7

Terms of use

Copyright holder:: IEEE
Notes:: The author accepted manuscript (AAM) of this paper has been made available under the University of Oxford's Open Access Publications Policy, and a CC BY public copyright licence has been applied.

Licence:: CC Attribution (CC BY)

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Journal article

Self-supervised monocular depth and pose estimation for endoscopy with latent priors

Actions

Access Document

Authors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Journal article

Self-supervised monocular depth and pose estimation for endoscopy with latent priors

Actions

Access Document

Authors

Funding

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions