Journal article icon

Journal article

PhysFormer++: facial video-based physiological measurement with slowfast temporal difference transformer

Abstract:

Remote photoplethysmography (rPPG), which aims at measuring heart activities and physiological signals from facial video without any contact, has great potential in many applications (e.g., remote healthcare and affective computing). Recent deep learning approaches focus on mining subtle rPPG clues using convolutional neural networks with limited spatio-temporal receptive fields, which neglect the long-range spatio-temporal perception and interaction for rPPG modeling. In this paper, we propose two end-to-end video transformer based architectures, namely PhysFormer and PhysFormer++, to adaptively aggregate both local and global spatio-temporal features for rPPG representation enhancement. As key modules in PhysFormer, the temporal difference transformers first enhance the quasi-periodic rPPG features with temporal difference guided global attention, and then refine the local spatio-temporal representation against interference. To better exploit the temporal contextual and periodic rPPG clues, we also extend the PhysFormer to the two-pathway SlowFast based PhysFormer++ with temporal difference periodic and cross-attention transformers. Furthermore, we propose the label distribution learning and a curriculum learning inspired dynamic constraint in frequency domain, which provide elaborate supervisions for PhysFormer and PhysFormer++ and alleviate overfitting. Comprehensive experiments are performed on four benchmark datasets to show our superior performance on both intra- and cross-dataset testings. Unlike most transformer networks needed pretraining from large-scale datasets, the proposed PhysFormer family can be easily trained from scratch on rPPG datasets, which makes it promising as a novel transformer baseline for the rPPG community.

Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Files:
Publisher copy:
10.1007/s11263-023-01758-1

Authors


More by this author
Role:
Author
ORCID:
0000-0001-6505-3304
More by this author
Institution:
University of Oxford
Role:
Author


Publisher:
Springer Nature
Journal:
International Journal of Computer Vision More from this journal
Volume:
131
Pages:
1307–1330
Publication date:
2023-01-15
Acceptance date:
2023-01-06
DOI:
EISSN:
1573-1405
ISSN:
0920-5691


Language:
English
Keywords:
Pubs id:
1329878
Local pid:
pubs:1329878
Deposit date:
2023-03-01

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP