Preprint
Cluster-first labelling: an automated pipeline for segmentation and morphological clustering in histology whole slide images
- Abstract:
- Labelling tissue components in histology whole slide images (WSIs) is prohibitively labour-intensive: a single slide may contain tens of thousands of structures--cells, nuclei, and other morphologically distinct objects--each requiring manual boundary delineation and classification. We present a cloudnative, end-to-end pipeline that automates this process through a cluster-first paradigm. Our system tiles WSIs, filters out tiles deemed unlikely to contain valuable information, segments tissue components with Cellpose-SAM (including cells, nuclei, and other morphologically similar structures), extracts neural embeddings via a pretrained ResNet-50, reduces dimensionality with UMAP, and groups morphologically similar objects using DBSCAN clustering. Under this paradigm, a human annotator labels representative clusters rather than individual objects, reducing annotation effort by orders of magnitude. We evaluate the pipeline on 3,696 tissue components across 13 diverse tissue types from three species (human, rat, rabbit), measuring how well unsupervised clusters align with independent human labels via per-tile Hungarian-algorithm matching. Our system achieves a weighted cluster-label alignment accuracy of 96.8%, with 7 of 13 tissue types reaching perfect agreement. The pipeline, a companion labelling web application, and all evaluation code are released as open-source software.
- Publication status:
- Published
- Peer review status:
- Not peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Pre-print, pdf, 769.0KB, Terms of use)
-
- Preprint server copy:
- 10.48550/arXiv.2604.09370
Authors
- Preprint server:
- arXiv
- Publication date:
- 2026-04-10
- DOI:
- EISSN:
-
2331-8422
- Language:
-
English
- Pubs id:
-
2405473
- Local pid:
-
pubs:2405473
- Deposit date:
-
2026-04-13
- ARK identifier:
Terms of use
- Copyright holder:
- Ahmed et al
- Copyright date:
- 2026
- Rights statement:
- ©2026 The Authors. This paper is an open access article distributed under the terms of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/)
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record