Multimodal Person Verification With Generative Thermal Data Augmentation

Madina Abdrakhmanova;Timur Unaspekov;Huseyin Atakan Varol
{"title":"Multimodal Person Verification With Generative Thermal Data Augmentation","authors":"Madina Abdrakhmanova;Timur Unaspekov;Huseyin Atakan Varol","doi":"10.1109/TBIOM.2023.3346938","DOIUrl":null,"url":null,"abstract":"The fusion of audio, visual, and thermal modalities has proven effective in developing reliable person verification systems. In this study, we enhanced multimodal person verification performance by augmenting training data using domain transfer methods. Specifically, we enriched the audio-visual-thermal SpeakingFaces dataset with a combination of real audio-visual data and synthetic thermal data from the VoxCeleb dataset. We adapted visual images in VoxCeleb to the thermal domain using CycleGAN, trained on SpeakingFaces. Our results demonstrate the positive impact of augmented training data on all unimodal and multimodal models. The score fusion of unimodal audio, unimodal visual, bimodal, and trimodal systems trained on the combined data achieved the best results on both datasets and exhibited robustness in low-illumination and noisy conditions. Our findings emphasize the importance of utilizing synthetic data, produced by generative methods, to improve deep learning model performance. To facilitate reproducibility and further research in multimodal person verification, we have made our code, pretrained models, and preprocessed dataset freely available in our GitHub repository.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 1","pages":"43-53"},"PeriodicalIF":5.0000,"publicationDate":"2023-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on biometrics, behavior, and identity science","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10374245/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The fusion of audio, visual, and thermal modalities has proven effective in developing reliable person verification systems. In this study, we enhanced multimodal person verification performance by augmenting training data using domain transfer methods. Specifically, we enriched the audio-visual-thermal SpeakingFaces dataset with a combination of real audio-visual data and synthetic thermal data from the VoxCeleb dataset. We adapted visual images in VoxCeleb to the thermal domain using CycleGAN, trained on SpeakingFaces. Our results demonstrate the positive impact of augmented training data on all unimodal and multimodal models. The score fusion of unimodal audio, unimodal visual, bimodal, and trimodal systems trained on the combined data achieved the best results on both datasets and exhibited robustness in low-illumination and noisy conditions. Our findings emphasize the importance of utilizing synthetic data, produced by generative methods, to improve deep learning model performance. To facilitate reproducibility and further research in multimodal person verification, we have made our code, pretrained models, and preprocessed dataset freely available in our GitHub repository.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用生成式热数据增强技术进行多模态人员验证
在开发可靠的人员验证系统时,音频、视觉和热成像模式的融合已被证明是有效的。在这项研究中,我们利用领域转移方法增强了训练数据,从而提高了多模态人员验证性能。具体来说,我们结合 VoxCeleb 数据集中的真实视听数据和合成热数据,丰富了视听热 SpeakingFaces 数据集。我们使用在 SpeakingFaces 上训练的 CycleGAN 将 VoxCeleb 中的视觉图像调整到热学领域。我们的结果表明,增强型训练数据对所有单模态和多模态模型都有积极影响。在综合数据上训练的单模态音频、单模态视觉、双模态和三模态系统的得分融合在两个数据集上都取得了最佳结果,并在低照度和噪声条件下表现出了鲁棒性。我们的研究结果强调了利用生成方法产生的合成数据来提高深度学习模型性能的重要性。为了促进多模态人员验证的可重复性和进一步研究,我们在 GitHub 存储库中免费提供了我们的代码、预训练模型和预处理数据集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
10.90
自引率
0.00%
发文量
0
期刊最新文献
Table of Contents Impact of Sunglasses on One-to-Many Facial Identification Accuracy MDL-Net: Multi-Task Learning Network for Face Forgery Detection and Localization Using Dual-Stream Feature Extraction and Reconstruction Face Morphing Attack Generation and Detection: A State-of-the-Art Review Advancing Brainwave-Based Biometrics: A Large-Scale, Multi-Session Evaluation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1