用于唇语识别的多尺度特征融合网络

Haohuai Lin, Bowen Liu, Gangdong Zhang, Qiang Yin, Liuqing Yang, Ping Lan
{"title":"用于唇语识别的多尺度特征融合网络","authors":"Haohuai Lin, Bowen Liu, Gangdong Zhang, Qiang Yin, Liuqing Yang, Ping Lan","doi":"10.1109/ICPECA60615.2024.10471068","DOIUrl":null,"url":null,"abstract":"Visual speech recognition (VSR) is also known as lip recognition. Recently, it has been widely explored due to the development of deep learning. Lip recognition is a discrimination issue, where the information provided by the delicate movement of the lips is most remarkable of all. This places a higher demand on the model's ability to extract features of minor variation around the lips. In this paper, a three-dimensional convolutional network (3D CNN) multi-branch feature fusion network is proposed for extracting spatiotemporal featuresof continuous images. The features of multi-branch feature fusion network are utilized to fully extract partial and general characteristics from sequential imagery and further enhance the feature information to deliver more accurate function info to the back-end classification network. The excellence of quite a few methods requires the support of huge volume of data, and in favor of test the effect of small-scale data sets. This experimentis conducted using the Oulu Vs2dataset to obtain exciting experimental results. After 20 iterations of the experiment, the maximum accuracy absolutely improves by 0.8% and the average accuracy improves by 1%.","PeriodicalId":518671,"journal":{"name":"2024 IEEE 4th International Conference on Power, Electronics and Computer Applications (ICPECA)","volume":"55 4","pages":"541-545"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Scale Feature Fusion Network for Lip Recognition\",\"authors\":\"Haohuai Lin, Bowen Liu, Gangdong Zhang, Qiang Yin, Liuqing Yang, Ping Lan\",\"doi\":\"10.1109/ICPECA60615.2024.10471068\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Visual speech recognition (VSR) is also known as lip recognition. Recently, it has been widely explored due to the development of deep learning. Lip recognition is a discrimination issue, where the information provided by the delicate movement of the lips is most remarkable of all. This places a higher demand on the model's ability to extract features of minor variation around the lips. In this paper, a three-dimensional convolutional network (3D CNN) multi-branch feature fusion network is proposed for extracting spatiotemporal featuresof continuous images. The features of multi-branch feature fusion network are utilized to fully extract partial and general characteristics from sequential imagery and further enhance the feature information to deliver more accurate function info to the back-end classification network. The excellence of quite a few methods requires the support of huge volume of data, and in favor of test the effect of small-scale data sets. This experimentis conducted using the Oulu Vs2dataset to obtain exciting experimental results. After 20 iterations of the experiment, the maximum accuracy absolutely improves by 0.8% and the average accuracy improves by 1%.\",\"PeriodicalId\":518671,\"journal\":{\"name\":\"2024 IEEE 4th International Conference on Power, Electronics and Computer Applications (ICPECA)\",\"volume\":\"55 4\",\"pages\":\"541-545\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2024 IEEE 4th International Conference on Power, Electronics and Computer Applications (ICPECA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPECA60615.2024.10471068\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 IEEE 4th International Conference on Power, Electronics and Computer Applications (ICPECA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPECA60615.2024.10471068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

视觉语音识别(VSR)又称唇语识别。最近,由于深度学习的发展,它得到了广泛的探索。嘴唇识别是一个辨别问题,其中嘴唇的微妙运动所提供的信息最为显著。这就对模型提取嘴唇周围细微变化特征的能力提出了更高的要求。本文提出了一种三维卷积网络(3D CNN)多分支特征融合网络,用于提取连续图像的时空特征。利用多分支特征融合网络的特征从连续图像中充分提取局部和总体特征,并进一步增强特征信息,从而为后端分类网络提供更准确的功能信息。不少方法的优劣需要海量数据的支持,而小规模数据集则有利于测试效果。本实验使用奥卢 Vs2 数据集进行,获得了令人振奋的实验结果。经过 20 次迭代实验后,最大准确率绝对提高了 0.8%,平均准确率提高了 1%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Multi-Scale Feature Fusion Network for Lip Recognition
Visual speech recognition (VSR) is also known as lip recognition. Recently, it has been widely explored due to the development of deep learning. Lip recognition is a discrimination issue, where the information provided by the delicate movement of the lips is most remarkable of all. This places a higher demand on the model's ability to extract features of minor variation around the lips. In this paper, a three-dimensional convolutional network (3D CNN) multi-branch feature fusion network is proposed for extracting spatiotemporal featuresof continuous images. The features of multi-branch feature fusion network are utilized to fully extract partial and general characteristics from sequential imagery and further enhance the feature information to deliver more accurate function info to the back-end classification network. The excellence of quite a few methods requires the support of huge volume of data, and in favor of test the effect of small-scale data sets. This experimentis conducted using the Oulu Vs2dataset to obtain exciting experimental results. After 20 iterations of the experiment, the maximum accuracy absolutely improves by 0.8% and the average accuracy improves by 1%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Research on Fault Analysis and Remote Fault Diagnosis Technology of New Large Capacity Synchronous Condenser An Integrated Target Recognition Method Based on Improved Faster-RCNN for Apple Detection, Counting, Localization, and Quality Estimation Facial Image Restoration Algorithm Based on Generative Adversarial Networks A Data Retrieval Method Based on AGCN-WGAN Long Term Electricity Consumption Forecast Based on DA-LSTM
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1