WhisperNet: Deep Siamese Network For Emotion and Speech Tempo Invariant Visual-Only Lip-Based Biometric

Abdollah Zakeri, H. Hassanpour
{"title":"WhisperNet: Deep Siamese Network For Emotion and Speech Tempo Invariant Visual-Only Lip-Based Biometric","authors":"Abdollah Zakeri, H. Hassanpour","doi":"10.1109/ICSPIS54653.2021.9729394","DOIUrl":null,"url":null,"abstract":"In the recent decade, the field of biometrics was revolutionized thanks to the rise of deep learning. Many improvements were done on old biometric methods which reduced the security concerns. Before biometric people verification methods like facial recognition, an imposter could access people's vital information simply by finding out their password via installing a key-logger on their system. Thanks to deep learning, safer biometric approaches to person verification and person re-identification like visual authentication and audio-visual authentication were made possible and applicable on many devices like smartphones and laptops. Unfortunately, facial recognition is considered to be a threat to personal privacy by some people. Additionally, biometric methods that use the audio modality are not always applicable due to reasons like audio noise present in the environment. Lip-based biometric authentication (LBBA) is the process of authenticating a person using a video of their lips' movement while talking. In order to solve the mentioned concerns about other biometric authentication methods, we can use a visual-only LBBA method. Since people might have different emotional states that could potentially affect their utterance and speech tempo, the audio-only LBBA method must be able to produce an emotional and speech tempo invariant embedding of the input utterance video. In this article, we proposed a network inspired by the Siamese architecture that learned to produce emotion and speech tempo invariant representations of the input utterance videos. In order to train and test our proposed network, we used the CREMA-D dataset and achieved 95.41 % accuracy on the validation set.","PeriodicalId":286966,"journal":{"name":"2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSPIS54653.2021.9729394","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In the recent decade, the field of biometrics was revolutionized thanks to the rise of deep learning. Many improvements were done on old biometric methods which reduced the security concerns. Before biometric people verification methods like facial recognition, an imposter could access people's vital information simply by finding out their password via installing a key-logger on their system. Thanks to deep learning, safer biometric approaches to person verification and person re-identification like visual authentication and audio-visual authentication were made possible and applicable on many devices like smartphones and laptops. Unfortunately, facial recognition is considered to be a threat to personal privacy by some people. Additionally, biometric methods that use the audio modality are not always applicable due to reasons like audio noise present in the environment. Lip-based biometric authentication (LBBA) is the process of authenticating a person using a video of their lips' movement while talking. In order to solve the mentioned concerns about other biometric authentication methods, we can use a visual-only LBBA method. Since people might have different emotional states that could potentially affect their utterance and speech tempo, the audio-only LBBA method must be able to produce an emotional and speech tempo invariant embedding of the input utterance video. In this article, we proposed a network inspired by the Siamese architecture that learned to produce emotion and speech tempo invariant representations of the input utterance videos. In order to train and test our proposed network, we used the CREMA-D dataset and achieved 95.41 % accuracy on the validation set.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
WhisperNet:深度暹罗网络的情感和语音节奏不变的视觉仅基于嘴唇的生物识别
近十年来,由于深度学习的兴起,生物识别领域发生了革命性的变化。对旧的生物识别方法进行了许多改进,减少了安全问题。在面部识别等生物识别技术出现之前,冒名顶替者只需在人们的系统上安装键盘记录器,找出密码,就能获取人们的重要信息。由于深度学习,更安全的生物识别方法可以用于人员验证和人员再识别,如视觉认证和视听认证,并适用于智能手机和笔记本电脑等许多设备。不幸的是,面部识别被一些人认为是对个人隐私的威胁。此外,由于环境中存在音频噪声等原因,使用音频模态的生物识别方法并不总是适用。基于嘴唇的生物识别认证(LBBA)是一种利用说话时嘴唇运动的视频来验证一个人身份的过程。为了解决上述对其他生物识别认证方法的担忧,我们可以使用仅视觉的LBBA方法。由于人们可能有不同的情绪状态,这可能会影响他们的话语和语音节奏,因此纯音频LBBA方法必须能够对输入的话语视频产生情绪和语音节奏不变的嵌入。在本文中,我们提出了一个受Siamese架构启发的网络,该网络学会了对输入的话语视频产生情感和语音节奏不变的表示。为了训练和测试我们提出的网络,我们使用CREMA-D数据集,在验证集上达到95.41%的准确率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Intelligent Fault Diagnosis of Rolling BearingBased on Deep Transfer Learning Using Time-Frequency Representation Wind Energy Potential Approximation with Various Metaheuristic Optimization Techniques Deployment Listening to Sounds of Silence for Audio replay attack detection Transcranial Magnetic Stimulation of Prefrontal Cortex Alters Functional Brain Network Architecture: Graph Theoretical Analysis Anomaly Detection and Resilience-Oriented Countermeasures against Cyberattacks in Smart Grids
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1