Conference item icon

Conference item

Evaluating histopathology foundation models for few-shot tissue clustering: an application to LC25000 augmented dataset cleaning

Abstract:
Recent digital histopathology datasets have significantly advanced the development of deep learning-based histopathology frameworks. However, data leakage in model training can lead to artificially high metrics that do not genuinely reflect the strength of the approach. The LC25000 dataset, consisting of tissue image tiles extracted from lung and colon samples, is a popular benchmark dataset. In the released version, tissue tiles were augmented randomly and mixed. Nevertheless, many studies report near-perfect accuracy scores, often due to data leakage, where augmented images of the same tissue tile are split into both training and test sets. To improve the quality of performance reports, we develop a semi-automatic pipeline to clean LC25000. By clustering and separating all augmented images of the same tiles, using recently proposed histopathology foundation models and manual correction, we create a clean version of LC25000. We then evaluate the quality of features extracted by these foundational models, using the clustering task as a benchmark. Our contributions are: 1) We publicly release our semi-automatic annotation pipeline along with the LC25000-clean dataset to facilitate appropriate utilization of this dataset, reducing the risk of overestimating models’ performance; 2) We profile various combinations of feature extraction and clustering methods for identifying duplicates of the same image generated by basic image transformations; 3) We propose the clustering task as a minimal-setup benchmark to evaluate the quality of tissue image features learned by histopathology foundation models. Clustering labels, annotation pipeline, and evaluation code: https://github.com/GeorgeBatch/LC25000-clean.
Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Publisher copy:
10.1007/978-3-031-73748-0_2

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Oxford college:
Linacre College
Role:
Author
ORCID:
0000-0002-4899-4935
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author


More from this funder
Funder identifier:
https://ror.org/0439y7842
Grant:
EP/S02428X/1


Publisher:
Springer
Host title:
Data Engineering in Medical Imaging: Second MICCAI Workshop, DEMI 2024, Held in Conjunction with MICCAI 2024, Marrakesh, Morocco, October 10, 2024, Proceedings
Pages:
11-21
Series:
Lecture Notes in Computer Science
Series number:
15265
Place of publication:
Cham, Switzerland
Publication date:
2024-10-25
Acceptance date:
2024-07-16
Event title:
2nd Workshop in Data Engineering in Medical Imaging (DEMI) at MICCAI 2024
Event location:
Marrakesh, Morocco
Event website:
https://demi-workshop.github.io/
Event start date:
2024-10-10
Event end date:
2024-10-10
DOI:
EISSN:
1611-3349
ISSN:
0302-9743
EISBN:
9783031737480
ISBN:
9783031737473


Language:
English
Keywords:
Pubs id:
2016327
Local pid:
pubs:2016327
Deposit date:
2024-07-17

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP