Dataset icon

Dataset : Software

Crowdsourced data tools

Documentation:
Extracting Keywords from Crowdsourced Collections was a Digital Scholarship @ Oxford (DiSc) Research Development Grant-funded project based in the Faculty of English at the University of Oxford, which ran from November 2024 to July 2025. Using the Their Finest Hour Online Archive, a digital collection of 2,000+ records and 26,000+ files related to the Second World War, as a case study, this project set out to explore how Natural Language Processing (NLP) methods could be utilised to extract keywords from crowdsourced digital collection data. Assigning appropriate keyword tags to digital collection records is a crucial step in supporting search and discovery, as well as adherence to FAIR data principles. Traditionally, this process has involved manually assigning keywords, often using a pre-defined/inherited controlled vocabulary used within a particular institution. Manual tagging of keywords can be resource-intensive, potentially lead to the misrepresentation of records or collections, and can perpetuate historic assumptions, biases and stereotypes associated with particular domains. While there have been efforts to democratise digital collections metadata creation, in the case of keyword tagging, the primary assumption that underpins this process remains the same: individuals should select and then impose keywords on historical data. This project sought to invert that assumption, and explore the extent to which NLP methods and tools can be used to allow collection records to generate their own keyword tags, and thus describe or 'speak for' themselves. This is particularly relevant to crowdsourced collections of personal histories, especially Second World War collections, at a time when representations of the past are being reshaped to serve political interests.

Actions

Access Document

Files:
Publication website:
https://github.com/Digital-Scholarship-Oxford/crowdsourced-data-tools

Authors/Creators

More by this author/creator
Institution:
University of Oxford
Division:
GLAM
Department:
Bodleian Libraries
Role:
Creator
More by this author/creator
Institution:
University of Oxford
Division:
GLAM
Department:
Bodleian Libraries Directorate
Role:
Creator
ORCID:
0009-0005-1881-5392
More by this author/creator
Institution:
University of Oxford
Division:
ContEd
Department:
Continuing Education
Role:
Creator
ORCID:
0000-0001-9480-7398
More by this author/creator
Institution:
University of Oxford
Division:
UAS
Department:
Academic Resources and Information Systems
Role:
Creator
ORCID:
0000-0002-1830-2352

Contributors

Institution:
University of Oxford
Division:
UAS
Department:
Academic Resources and Information Systems
Role:
Principal Investigator (PI)
ORCID:
0000-0002-1830-2352


More from this funder
Funder identifier:
https://ror.org/052gg0110
Programme:
Digital Scholarship @Oxford Research Development Grant


Publisher:
University of Oxford
Publication date:
2025
DOI:


Language:
English
Keywords:
Subtype:
Software
Pubs id:
2368923
UUID:
uuid_5c270001-8122-401e-8d25-02a730f55a86
Local pid:
pubs:2368923
Deposit date:
2026-02-08
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP