Preprint icon

Preprint

Cluster-first labelling: an automated pipeline for segmentation and morphological clustering in histology whole slide images

Abstract:
Labelling tissue components in histology whole slide images (WSIs) is prohibitively labour-intensive: a single slide may contain tens of thousands of structures--cells, nuclei, and other morphologically distinct objects--each requiring manual boundary delineation and classification. We present a cloudnative, end-to-end pipeline that automates this process through a cluster-first paradigm. Our system tiles WSIs, filters out tiles deemed unlikely to contain valuable information, segments tissue components with Cellpose-SAM (including cells, nuclei, and other morphologically similar structures), extracts neural embeddings via a pretrained ResNet-50, reduces dimensionality with UMAP, and groups morphologically similar objects using DBSCAN clustering. Under this paradigm, a human annotator labels representative clusters rather than individual objects, reducing annotation effort by orders of magnitude. We evaluate the pipeline on 3,696 tissue components across 13 diverse tissue types from three species (human, rat, rabbit), measuring how well unsupervised clusters align with independent human labels via per-tile Hungarian-algorithm matching. Our system achieves a weighted cluster-label alignment accuracy of 96.8%, with 7 of 13 tissue types reaching perfect agreement. The pipeline, a companion labelling web application, and all evaluation code are released as open-source software.
Publication status:
Published
Peer review status:
Not peer reviewed

Actions

Access Document

Preprint server copy:
10.48550/arXiv.2604.09370

Authors

More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
ORCID:
0009-0001-4771-2721
More by this author
Institution:
University of Oxford
Division:
MSD
Department:
Physiology Anatomy and Genetics
Role:
Author
ORCID:
0000-0003-0403-3945
More by this author
Institution:
University of Oxford
Division:
MSD
Department:
Divisional Administration
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MSD
Department:
Divisional Administration
Role:
Author


Preprint server:
arXiv
Publication date:
2026-04-10
DOI:
EISSN:
2331-8422


Language:
English
Pubs id:
2405473
Local pid:
pubs:2405473
Deposit date:
2026-04-13
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP