Journal article icon

Journal article

A pipeline to compile expert‐verified datasets of digitised herbarium specimens for automated plant identification to accelerate taxonomy

Abstract:
Societal Impact Statement: Understanding and protecting plant life is essential for tackling the twin challenges of biodiversity loss and climate change. To support this, we have developed a new digital approach that helps identify plant species more quickly and accurately. By using images of preserved plant specimens from global collections sourced through the Global Biodiversity Information Facility and combining computer vision technology with expert knowledge from plant scientists, our approach makes it easier to catalogue and study plants. This innovation not only speeds up scientific research but also strengthens the connection between traditional physical plant collections and modern digital collections and tools—helping scientists, conservationists and communities work together to safeguard nature. Summary: Computer vision applied to digital herbarium collections holds tremendous promise to streamline specimen identification and accelerate the work of taxonomists and herbarium curators. We present a sampling and image preprocessing pipeline applicable to any image dataset that uses the Darwin Core data standard. We tested it on Cyperaceae, a large monocot plant family known for its identification challenges, and on Rhamnaceae, a eudicot plant family, to demonstrate broad applicability across angiosperms. Digitised herbarium specimens were sampled via the Global Biodiversity Information Facility to create image datasets with balanced representation annotated with taxon labels. These were used to train deep learning models at genus level in Cyperaceae and Rhamnaceae, and at species level in the genera Bulbostylis and Ziziphus. A model fine‐tuned on the data performed efficiently and consistently achieved top‐1, top‐3 and top‐5 accuracy rates of ≥72%, ≥88% and ≥92% in identifying digitised herbarium specimens of Cyperaceae and Rhamnaceae to genus level. Species‐level identification in Bulbostylis reached 65%, 83% and 89%, while Ziziphus achieved higher rates of 72%, 85% and 90%. Our approach integrates an automated pipeline for dataset generation with expert verification to enhance data quality. This framework supports scalable, accurate identification of herbarium specimens and fosters a more dynamic relationship between digital and physical collections.
Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Files:
Publisher copy:
10.1002/ppp3.70149

Authors

More by this author
Role:
Author
ORCID:
0009-0008-9810-9907
More by this author
Role:
Author
ORCID:
0000-0001-7969-2609
More by this author
Role:
Author
ORCID:
0000-0003-0162-7975
More by this author
Role:
Author
ORCID:
0000-0002-5725-4170
More by this author
Role:
Author
ORCID:
0000-0003-4549-7092


More from this funder
Funder identifier:
https://ror.org/04g2vpn86
More from this funder
Funder identifier:
https://ror.org/03zttf063
More from this funder
Funder identifier:
https://ror.org/05a28rw58
More from this funder
Funder identifier:
https://ror.org/00ynnr806


Publisher:
Wiley
Journal:
Plants, People, Planet More from this journal
Publication date:
2025-12-23
Acceptance date:
2025-11-04
DOI:
EISSN:
2572-2611
ISSN:
2572-2611


Language:
English
Keywords:
UUID:
uuid_ca1df2dd-bcec-41ec-b601-66f0ec6167d7
Source identifiers:
3590594
Deposit date:
2025-12-23
ARK identifier:
This ORA record was generated from metadata provided by an external service. It has not been edited by the ORA Team.

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP