{"title":"用于唇语识别的多尺度特征融合网络","authors":"Haohuai Lin, Bowen Liu, Gangdong Zhang, Qiang Yin, Liuqing Yang, Ping Lan","doi":"10.1109/ICPECA60615.2024.10471068","DOIUrl":null,"url":null,"abstract":"Visual speech recognition (VSR) is also known as lip recognition. Recently, it has been widely explored due to the development of deep learning. Lip recognition is a discrimination issue, where the information provided by the delicate movement of the lips is most remarkable of all. This places a higher demand on the model's ability to extract features of minor variation around the lips. In this paper, a three-dimensional convolutional network (3D CNN) multi-branch feature fusion network is proposed for extracting spatiotemporal featuresof continuous images. The features of multi-branch feature fusion network are utilized to fully extract partial and general characteristics from sequential imagery and further enhance the feature information to deliver more accurate function info to the back-end classification network. The excellence of quite a few methods requires the support of huge volume of data, and in favor of test the effect of small-scale data sets. This experimentis conducted using the Oulu Vs2dataset to obtain exciting experimental results. After 20 iterations of the experiment, the maximum accuracy absolutely improves by 0.8% and the average accuracy improves by 1%.","PeriodicalId":518671,"journal":{"name":"2024 IEEE 4th International Conference on Power, Electronics and Computer Applications (ICPECA)","volume":"55 4","pages":"541-545"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Scale Feature Fusion Network for Lip Recognition\",\"authors\":\"Haohuai Lin, Bowen Liu, Gangdong Zhang, Qiang Yin, Liuqing Yang, Ping Lan\",\"doi\":\"10.1109/ICPECA60615.2024.10471068\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Visual speech recognition (VSR) is also known as lip recognition. Recently, it has been widely explored due to the development of deep learning. Lip recognition is a discrimination issue, where the information provided by the delicate movement of the lips is most remarkable of all. This places a higher demand on the model's ability to extract features of minor variation around the lips. In this paper, a three-dimensional convolutional network (3D CNN) multi-branch feature fusion network is proposed for extracting spatiotemporal featuresof continuous images. The features of multi-branch feature fusion network are utilized to fully extract partial and general characteristics from sequential imagery and further enhance the feature information to deliver more accurate function info to the back-end classification network. The excellence of quite a few methods requires the support of huge volume of data, and in favor of test the effect of small-scale data sets. This experimentis conducted using the Oulu Vs2dataset to obtain exciting experimental results. After 20 iterations of the experiment, the maximum accuracy absolutely improves by 0.8% and the average accuracy improves by 1%.\",\"PeriodicalId\":518671,\"journal\":{\"name\":\"2024 IEEE 4th International Conference on Power, Electronics and Computer Applications (ICPECA)\",\"volume\":\"55 4\",\"pages\":\"541-545\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2024 IEEE 4th International Conference on Power, Electronics and Computer Applications (ICPECA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPECA60615.2024.10471068\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 IEEE 4th International Conference on Power, Electronics and Computer Applications (ICPECA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPECA60615.2024.10471068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi-Scale Feature Fusion Network for Lip Recognition
Visual speech recognition (VSR) is also known as lip recognition. Recently, it has been widely explored due to the development of deep learning. Lip recognition is a discrimination issue, where the information provided by the delicate movement of the lips is most remarkable of all. This places a higher demand on the model's ability to extract features of minor variation around the lips. In this paper, a three-dimensional convolutional network (3D CNN) multi-branch feature fusion network is proposed for extracting spatiotemporal featuresof continuous images. The features of multi-branch feature fusion network are utilized to fully extract partial and general characteristics from sequential imagery and further enhance the feature information to deliver more accurate function info to the back-end classification network. The excellence of quite a few methods requires the support of huge volume of data, and in favor of test the effect of small-scale data sets. This experimentis conducted using the Oulu Vs2dataset to obtain exciting experimental results. After 20 iterations of the experiment, the maximum accuracy absolutely improves by 0.8% and the average accuracy improves by 1%.