Dataset : Software
Crowdsourced data tools
- Documentation:
- Extracting Keywords from Crowdsourced Collections was a Digital Scholarship @ Oxford (DiSc) Research Development Grant-funded project based in the Faculty of English at the University of Oxford, which ran from November 2024 to July 2025. Using the Their Finest Hour Online Archive, a digital collection of 2,000+ records and 26,000+ files related to the Second World War, as a case study, this project set out to explore how Natural Language Processing (NLP) methods could be utilised to extract keywords from crowdsourced digital collection data. Assigning appropriate keyword tags to digital collection records is a crucial step in supporting search and discovery, as well as adherence to FAIR data principles. Traditionally, this process has involved manually assigning keywords, often using a pre-defined/inherited controlled vocabulary used within a particular institution. Manual tagging of keywords can be resource-intensive, potentially lead to the misrepresentation of records or collections, and can perpetuate historic assumptions, biases and stereotypes associated with particular domains. While there have been efforts to democratise digital collections metadata creation, in the case of keyword tagging, the primary assumption that underpins this process remains the same: individuals should select and then impose keywords on historical data. This project sought to invert that assumption, and explore the extent to which NLP methods and tools can be used to allow collection records to generate their own keyword tags, and thus describe or 'speak for' themselves. This is particularly relevant to crowdsourced collections of personal histories, especially Second World War collections, at a time when representations of the past are being reshaped to serve political interests.
Actions
Access Document
- Files:
-
-
(Version of record, zip, 207.8KB, Terms of use)
-
- Publication website:
- https://github.com/Digital-Scholarship-Oxford/crowdsourced-data-tools
Authors/Creators
Contributors
+ Lee, S
- Institution:
- University of Oxford
- Division:
- UAS
- Department:
- Academic Resources and Information Systems
- Role:
- Principal Investigator (PI)
- ORCID:
- 0000-0002-1830-2352
+ University of Oxford
More from this funder
- Funder identifier:
- https://ror.org/052gg0110
- Programme:
- Digital Scholarship @Oxford Research Development Grant
- Publisher:
- University of Oxford
- Publication date:
- 2025
- DOI:
- Language:
-
English
- Keywords:
- Subtype:
-
Software
- Pubs id:
-
2368923
- UUID:
-
uuid_5c270001-8122-401e-8d25-02a730f55a86
- Local pid:
-
pubs:2368923
- Deposit date:
-
2026-02-08
- ARK identifier:
Terms of use
- Copyright holder:
- University of Oxford
- Copyright date:
- 2025
- Notes:
- The code and findings produced by this project are also available under a AGPLv3 license on GitHub.
- Licence:
- Other
If you are the owner of this record, you can report an update to it here: Report update to this record