Journal article icon

Journal article

Self-supervised monocular depth and pose estimation for endoscopy with latent priors

Abstract:
Accurate 3D reconstruction in endoscopy enables quantitative and holistic lesion characterization within the gastrointestinal (GI) tract. To achieve this, reliable depth and pose estimation is required. However, endoscopy systems are monocular, and existing methods relying on synthetic datasets or complex models often lack generalizability in challenging endoscopic conditions. We propose a robust self-supervised monocular depth and pose estimation framework that incorporates a StyleGAN-based generator and a Variational Autoencoder (VAE). The StyleGAN generator leverages extensive depth scenes from natural images to condition the depth network, enhancing realism and robustness of depth predictions through latent feature priors. For pose estimation, we reformulate it within a VAE framework, treating pose transitions as latent variables to regularize scale, stabilize z-axis prominence, and improve x-y sensitivity. To further enhance pose stability and generalizability, we introduce a prior transfer module that distills motion knowledge from natural scene SLAM systems. Specifically, pose priors from a pretrained SLAM model-supervised on large-scale natural scene datasets-are used to guide the latent distribution of pose through a KL-divergence reparameterization. This mechanism effectively transfers structural motion priors into the endoscopic domain, improving trajectory consistency under challenging conditions. This dual refinement pipeline enables accurate depth and pose predictions, effectively addressing the GI tract's complex textures and lighting. Extensive evaluations on SimCol, C3VD, and EndoSLAM datasets confirm our framework's superior performance over published self-supervised methods in endoscopic depth and pose estimation. All data descriptions and code are available at https://github.com/EricXuziang/ Self-supervised-with-Latent-Priors.git.
Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Files:
Publisher copy:
10.1109/tmi.2026.3671423

Authors

More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
ORCID:
0000-0002-3883-3716
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MSD
Department:
NDM
Role:
Author
ORCID:
0000-0001-8035-3700


More from this funder
Funder identifier:
https://ror.org/00c489v88
Grant:
SBF00101191
More from this funder
Funder identifier:
10.13039/501100000266
Grant:
UKRI914
More from this funder
Funder identifier:
10.13039/501100013373


Publisher:
IEEE
Journal:
IEEE Transactions on Medical Imaging More from this journal
Place of publication:
United States
Publication date:
2026-03-09
DOI:
EISSN:
1558-254X
ISSN:
0278-0062
Pmid:
41801778


Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP