Journal article
A pipeline to compile expert‐verified datasets of digitised herbarium specimens for automated plant identification to accelerate taxonomy
- Abstract:
- Societal Impact Statement: Understanding and protecting plant life is essential for tackling the twin challenges of biodiversity loss and climate change. To support this, we have developed a new digital approach that helps identify plant species more quickly and accurately. By using images of preserved plant specimens from global collections sourced through the Global Biodiversity Information Facility and combining computer vision technology with expert knowledge from plant scientists, our approach makes it easier to catalogue and study plants. This innovation not only speeds up scientific research but also strengthens the connection between traditional physical plant collections and modern digital collections and tools—helping scientists, conservationists and communities work together to safeguard nature. Summary: Computer vision applied to digital herbarium collections holds tremendous promise to streamline specimen identification and accelerate the work of taxonomists and herbarium curators. We present a sampling and image preprocessing pipeline applicable to any image dataset that uses the Darwin Core data standard. We tested it on Cyperaceae, a large monocot plant family known for its identification challenges, and on Rhamnaceae, a eudicot plant family, to demonstrate broad applicability across angiosperms. Digitised herbarium specimens were sampled via the Global Biodiversity Information Facility to create image datasets with balanced representation annotated with taxon labels. These were used to train deep learning models at genus level in Cyperaceae and Rhamnaceae, and at species level in the genera Bulbostylis and Ziziphus. A model fine‐tuned on the data performed efficiently and consistently achieved top‐1, top‐3 and top‐5 accuracy rates of ≥72%, ≥88% and ≥92% in identifying digitised herbarium specimens of Cyperaceae and Rhamnaceae to genus level. Species‐level identification in Bulbostylis reached 65%, 83% and 89%, while Ziziphus achieved higher rates of 72%, 85% and 90%. Our approach integrates an automated pipeline for dataset generation with expert verification to enhance data quality. This framework supports scalable, accurate identification of herbarium specimens and fosters a more dynamic relationship between digital and physical collections.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Version of record, pdf, 2.4MB, Terms of use)
-
- Publisher copy:
- 10.1002/ppp3.70149
Authors
+ Royal Holloway University of London
More from this funder
- Funder identifier:
- https://ror.org/04g2vpn86
- Publisher:
- Wiley
- Journal:
- Plants, People, Planet More from this journal
- Publication date:
- 2025-12-23
- Acceptance date:
- 2025-11-04
- DOI:
- EISSN:
-
2572-2611
- ISSN:
-
2572-2611
- Language:
-
English
- Keywords:
- UUID:
-
uuid_ca1df2dd-bcec-41ec-b601-66f0ec6167d7
- Source identifiers:
-
3590594
- Deposit date:
-
2025-12-23
- ARK identifier:
This ORA record was generated from metadata provided by an external service. It has not been edited by the ORA Team.
Terms of use
- Copyright date:
- 2025
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record