PhysMamba:利用慢-快时差曼巴进行高效远程生理测量

Chaoqi Luo, Yiping Xie, Zitong Yu
{"title":"PhysMamba:利用慢-快时差曼巴进行高效远程生理测量","authors":"Chaoqi Luo, Yiping Xie, Zitong Yu","doi":"arxiv-2409.12031","DOIUrl":null,"url":null,"abstract":"Facial-video based Remote photoplethysmography (rPPG) aims at measuring\nphysiological signals and monitoring heart activity without any contact,\nshowing significant potential in various applications. Previous deep learning\nbased rPPG measurement are primarily based on CNNs and Transformers. However,\nthe limited receptive fields of CNNs restrict their ability to capture\nlong-range spatio-temporal dependencies, while Transformers also struggle with\nmodeling long video sequences with high complexity. Recently, the state space\nmodels (SSMs) represented by Mamba are known for their impressive performance\non capturing long-range dependencies from long sequences. In this paper, we\npropose the PhysMamba, a Mamba-based framework, to efficiently represent\nlong-range physiological dependencies from facial videos. Specifically, we\nintroduce the Temporal Difference Mamba block to first enhance local dynamic\ndifferences and further model the long-range spatio-temporal context. Moreover,\na dual-stream SlowFast architecture is utilized to fuse the multi-scale\ntemporal features. Extensive experiments are conducted on three benchmark\ndatasets to demonstrate the superiority and efficiency of PhysMamba. The codes\nare available at https://github.com/Chaoqi31/PhysMamba","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":"13 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PhysMamba: Efficient Remote Physiological Measurement with SlowFast Temporal Difference Mamba\",\"authors\":\"Chaoqi Luo, Yiping Xie, Zitong Yu\",\"doi\":\"arxiv-2409.12031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Facial-video based Remote photoplethysmography (rPPG) aims at measuring\\nphysiological signals and monitoring heart activity without any contact,\\nshowing significant potential in various applications. Previous deep learning\\nbased rPPG measurement are primarily based on CNNs and Transformers. However,\\nthe limited receptive fields of CNNs restrict their ability to capture\\nlong-range spatio-temporal dependencies, while Transformers also struggle with\\nmodeling long video sequences with high complexity. Recently, the state space\\nmodels (SSMs) represented by Mamba are known for their impressive performance\\non capturing long-range dependencies from long sequences. In this paper, we\\npropose the PhysMamba, a Mamba-based framework, to efficiently represent\\nlong-range physiological dependencies from facial videos. Specifically, we\\nintroduce the Temporal Difference Mamba block to first enhance local dynamic\\ndifferences and further model the long-range spatio-temporal context. Moreover,\\na dual-stream SlowFast architecture is utilized to fuse the multi-scale\\ntemporal features. Extensive experiments are conducted on three benchmark\\ndatasets to demonstrate the superiority and efficiency of PhysMamba. The codes\\nare available at https://github.com/Chaoqi31/PhysMamba\",\"PeriodicalId\":501130,\"journal\":{\"name\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"volume\":\"13 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.12031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

基于面部视频的远程心电图(Remote photoplethysmography,rPPG)旨在测量生理信号,并在无任何接触的情况下监测心脏活动,在各种应用中显示出巨大的潜力。以往基于深度学习的 rPPG 测量主要基于 CNN 和变换器。然而,CNN 的感受野有限,限制了其捕捉长距离时空相关性的能力,而 Transformers 也难以模拟复杂度较高的长视频序列。最近,以 Mamba 为代表的状态空间模型(SSM)在捕捉长序列中的长距离依赖关系方面表现出色。在本文中,我们提出了一个基于 Mamba 的框架 PhysMamba,以有效表示面部视频中的长距离生理依赖关系。具体来说,我们引入了时差 Mamba 模块,首先增强局部动态差异,然后进一步建立长距离时空背景模型。此外,我们还利用双流 SlowFast 架构来融合多尺度时空特征。为了证明 PhysMamba 的优越性和高效性,我们在三个基准数据集上进行了广泛的实验。代码可在 https://github.com/Chaoqi31/PhysMamba
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
PhysMamba: Efficient Remote Physiological Measurement with SlowFast Temporal Difference Mamba
Facial-video based Remote photoplethysmography (rPPG) aims at measuring physiological signals and monitoring heart activity without any contact, showing significant potential in various applications. Previous deep learning based rPPG measurement are primarily based on CNNs and Transformers. However, the limited receptive fields of CNNs restrict their ability to capture long-range spatio-temporal dependencies, while Transformers also struggle with modeling long video sequences with high complexity. Recently, the state space models (SSMs) represented by Mamba are known for their impressive performance on capturing long-range dependencies from long sequences. In this paper, we propose the PhysMamba, a Mamba-based framework, to efficiently represent long-range physiological dependencies from facial videos. Specifically, we introduce the Temporal Difference Mamba block to first enhance local dynamic differences and further model the long-range spatio-temporal context. Moreover, a dual-stream SlowFast architecture is utilized to fuse the multi-scale temporal features. Extensive experiments are conducted on three benchmark datasets to demonstrate the superiority and efficiency of PhysMamba. The codes are available at https://github.com/Chaoqi31/PhysMamba
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Massively Multi-Person 3D Human Motion Forecasting with Scene Context Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Precise Forecasting of Sky Images Using Spatial Warping JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation Applications of Knowledge Distillation in Remote Sensing: A Survey
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1