Wei Qian;Dan Guo;Kun Li;Xiaowei Zhang;Xilan Tian;Xun Yang;Meng Wang
{"title":"Dual-Path TokenLearner for Remote Photoplethysmography-Based Physiological Measurement With Facial Videos","authors":"Wei Qian;Dan Guo;Kun Li;Xiaowei Zhang;Xilan Tian;Xun Yang;Meng Wang","doi":"10.1109/TCSS.2024.3356713","DOIUrl":null,"url":null,"abstract":"Remote photoplethysmography (rPPG)-based physiological measurement is an emerging yet crucial vision task, whose challenge lies in exploring accurate rPPG prediction from facial videos accompanied by noises of illumination variations, facial occlusions, head movements, etc., in a noncontact manner. Existing mainstream convolutional neural network (CNN)-based models make efforts to detect physiological signals by capturing subtle color changes in facial regions of interest (ROI) caused by heartbeats. However, such models are constrained by the limited local spatial or temporal receptive fields in the neural units. Unlike them, a native transformer-based framework called dual-path TokenLearner (dual-TL) is proposed in this article, which utilizes the concept of learnable tokens to integrate both spatial and temporal informative contexts from the global perspective of the video. Specifically, the proposed dual-TL uses a spatial TokenLearner (S-TL) to explore associations in different facial ROIs, which promises the rPPG prediction far away from noisy ROI disturbances. Complementarily, a temporal TokenLearner (T-TL) is designed to infer the quasi-periodic pattern of heartbeats, which eliminates temporal disturbances such as head movements. The two TokenLearners, S-TL and T-TL, are executed in a dual-path mode. This enables the model to reduce noise disturbances for final rPPG signal prediction. Extensive experiments on four physiological measurement benchmark datasets are conducted. The dual-TL achieves state-of-the-art performances in both intra and cross-dataset testings, demonstrating its immense potential as a basic backbone for rPPG measurement.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":null,"pages":null},"PeriodicalIF":4.5000,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Social Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10445699/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, CYBERNETICS","Score":null,"Total":0}
引用次数: 0
Abstract
Remote photoplethysmography (rPPG)-based physiological measurement is an emerging yet crucial vision task, whose challenge lies in exploring accurate rPPG prediction from facial videos accompanied by noises of illumination variations, facial occlusions, head movements, etc., in a noncontact manner. Existing mainstream convolutional neural network (CNN)-based models make efforts to detect physiological signals by capturing subtle color changes in facial regions of interest (ROI) caused by heartbeats. However, such models are constrained by the limited local spatial or temporal receptive fields in the neural units. Unlike them, a native transformer-based framework called dual-path TokenLearner (dual-TL) is proposed in this article, which utilizes the concept of learnable tokens to integrate both spatial and temporal informative contexts from the global perspective of the video. Specifically, the proposed dual-TL uses a spatial TokenLearner (S-TL) to explore associations in different facial ROIs, which promises the rPPG prediction far away from noisy ROI disturbances. Complementarily, a temporal TokenLearner (T-TL) is designed to infer the quasi-periodic pattern of heartbeats, which eliminates temporal disturbances such as head movements. The two TokenLearners, S-TL and T-TL, are executed in a dual-path mode. This enables the model to reduce noise disturbances for final rPPG signal prediction. Extensive experiments on four physiological measurement benchmark datasets are conducted. The dual-TL achieves state-of-the-art performances in both intra and cross-dataset testings, demonstrating its immense potential as a basic backbone for rPPG measurement.
期刊介绍:
IEEE Transactions on Computational Social Systems focuses on such topics as modeling, simulation, analysis and understanding of social systems from the quantitative and/or computational perspective. "Systems" include man-man, man-machine and machine-machine organizations and adversarial situations as well as social media structures and their dynamics. More specifically, the proposed transactions publishes articles on modeling the dynamics of social systems, methodologies for incorporating and representing socio-cultural and behavioral aspects in computational modeling, analysis of social system behavior and structure, and paradigms for social systems modeling and simulation. The journal also features articles on social network dynamics, social intelligence and cognition, social systems design and architectures, socio-cultural modeling and representation, and computational behavior modeling, and their applications.