Journal article icon

Journal article

Reducing annotation burden in physical activity research using vision language models

Abstract:
Data from wearable devices collected in free-living settings, and labelled with physical activity behaviours compatible with health research, are essential for both validating existing wearable-based measurement approaches and developing novel machine learning approaches. One common way of obtaining these labels relies on laborious human annotation of sequences of images captured by body-worn cameras. The aim of this study was to investigate whether open-source vision-language models could accurately annotate activity intensity classes in wearable camera-based validation studies, thereby reducing the annotation burden. We compared the performance of three vision language models and two discriminative models on two free-living validation studies with 161 and 111 participants, collected in Oxfordshire, United Kingdom and Sichuan, China, respectively, using the Autographer (OMG Life, defunct) wearable camera. We found that the best open-source vision-language model (VLM) and fine-tuned discriminative model (DM) achieved comparable performance when predicting sedentary behaviour from single images on unseen participants in the Oxfordshire study; median F1-scores: VLM = 0.89 (0.84, 0.92), DM = 0.91 (0.86, 0.95). Performance declined for light [VLM = 0.60 (0.56, 0.67), DM = 0.70 (0.63, 0.79)], and moderate-to-vigorous intensity physical activity [VLM = 0.66 (0.53, 0.85); DM = 0.72 (0.58, 0.84)]. When applied to the external Sichuan study, performance fell across all intensity categories, with median Cohen’s κ scores falling from 0.54 (0.49, 0.64) to 0.26 (0.15, 0.37) for the VLM, and from 0.67 (0.60, 0.74) to 0.19 (0.10, 0.30) for the DM. Freely available computer vision models could help annotate sedentary behaviour, typically the most prevalent activity of daily living, from wearable camera images within similar populations to seen data, reducing the annotation burden when using cameras as the source of ground-truth.
Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Files:
Publisher copy:
10.1038/s41598-025-21350-6

Authors

More by this author
Institution:
University of Oxford
Division:
MSD
Department:
Nuffield Department of Population Health
Sub department:
Population Health
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MSD
Department:
Nuffield Department of Population Health
Sub department:
Population Health
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Sub department:
Computer Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MSD
Department:
Nuffield Department of Population Health
Sub department:
Population Health
Role:
Author



Publisher:
Nature Research
Journal:
Scientific Reports More from this journal
Volume:
15
Issue:
1
Article number:
37253
Publication date:
2025-10-24
Acceptance date:
2025-09-19
DOI:
EISSN:
2045-2322
ISSN:
2045-2322


Source identifiers:
3407128
Deposit date:
2024-10-24
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP