基于VGG16卷积神经网络的卡纳达语视觉语音识别

IF 1.3 Q3 ACOUSTICS Acoustics (Basel, Switzerland) Pub Date : 2023-03-16 DOI:10.3390/acoustics5010020

Shashidhar Rudregowda, Sudarshan Patil Kulkarni, Gururaj H L, Vinayakumar Ravi, M. Krichen

{"title":"基于VGG16卷积神经网络的卡纳达语视觉语音识别","authors":"Shashidhar Rudregowda, Sudarshan Patil Kulkarni, Gururaj H L, Vinayakumar Ravi, M. Krichen","doi":"10.3390/acoustics5010020","DOIUrl":null,"url":null,"abstract":"Visual speech recognition (VSR) is a method of reading speech by noticing the lip actions of the narrators. Visual speech significantly depends on the visual features derived from the image sequences. Visual speech recognition is a stimulating process that poses various challenging tasks to human machine-based procedures. VSR methods clarify the tasks by using machine learning. Visual speech helps people who are hearing impaired, laryngeal patients, and are in a noisy environment. In this research, authors developed our dataset for the Kannada Language. The dataset contained five words, which are Avanu, Bagge, Bari, Guruthu, Helida, and these words are randomly chosen. The average duration of each video is 1 s to 1.2 s. The machine learning method is used for feature extraction and classification. Here, authors applied VGG16 Convolution Neural Network for our custom dataset, and relu activation function is used to get an accuracy of 91.90% and the recommended system confirms the effectiveness of the system. The proposed output is compared with HCNN, ResNet-LSTM, Bi-LSTM, and GLCM-ANN, and evidenced the effectiveness of the recommended system.","PeriodicalId":72045,"journal":{"name":"Acoustics (Basel, Switzerland)","volume":" ","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2023-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Visual Speech Recognition for Kannada Language Using VGG16 Convolutional Neural Network\",\"authors\":\"Shashidhar Rudregowda, Sudarshan Patil Kulkarni, Gururaj H L, Vinayakumar Ravi, M. Krichen\",\"doi\":\"10.3390/acoustics5010020\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Visual speech recognition (VSR) is a method of reading speech by noticing the lip actions of the narrators. Visual speech significantly depends on the visual features derived from the image sequences. Visual speech recognition is a stimulating process that poses various challenging tasks to human machine-based procedures. VSR methods clarify the tasks by using machine learning. Visual speech helps people who are hearing impaired, laryngeal patients, and are in a noisy environment. In this research, authors developed our dataset for the Kannada Language. The dataset contained five words, which are Avanu, Bagge, Bari, Guruthu, Helida, and these words are randomly chosen. The average duration of each video is 1 s to 1.2 s. The machine learning method is used for feature extraction and classification. Here, authors applied VGG16 Convolution Neural Network for our custom dataset, and relu activation function is used to get an accuracy of 91.90% and the recommended system confirms the effectiveness of the system. The proposed output is compared with HCNN, ResNet-LSTM, Bi-LSTM, and GLCM-ANN, and evidenced the effectiveness of the recommended system.\",\"PeriodicalId\":72045,\"journal\":{\"name\":\"Acoustics (Basel, Switzerland)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2023-03-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Acoustics (Basel, Switzerland)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/acoustics5010020\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acoustics (Basel, Switzerland)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/acoustics5010020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 2

摘要

视觉语音识别（VSR）是一种通过注意叙述者的嘴唇动作来阅读语音的方法。视觉语音在很大程度上取决于从图像序列中导出的视觉特征。视觉语音识别是一个刺激性的过程，它对基于人机的过程提出了各种具有挑战性的任务。VSR方法通过使用机器学习来阐明任务。视觉语言有助于听力受损、喉部患者和嘈杂环境中的人。在这项研究中，作者开发了我们的卡纳达语数据集。数据集包含五个单词，分别是Avanu、Bagge、Bari、Gurutu、Helida，这些单词是随机选择的。每个视频的平均持续时间为1s至1.2s。机器学习方法用于特征提取和分类。在这里，作者将VGG16卷积神经网络应用于我们的自定义数据集，并使用relu激活函数获得91.90%的准确率，推荐的系统证实了该系统的有效性。将所提出的输出与HCNN、ResNet LSTM、Bi-LSTM和GLCM-ANN进行了比较，并证明了所推荐系统的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Visual Speech Recognition for Kannada Language Using VGG16 Convolutional Neural Network

Visual speech recognition (VSR) is a method of reading speech by noticing the lip actions of the narrators. Visual speech significantly depends on the visual features derived from the image sequences. Visual speech recognition is a stimulating process that poses various challenging tasks to human machine-based procedures. VSR methods clarify the tasks by using machine learning. Visual speech helps people who are hearing impaired, laryngeal patients, and are in a noisy environment. In this research, authors developed our dataset for the Kannada Language. The dataset contained five words, which are Avanu, Bagge, Bari, Guruthu, Helida, and these words are randomly chosen. The average duration of each video is 1 s to 1.2 s. The machine learning method is used for feature extraction and classification. Here, authors applied VGG16 Convolution Neural Network for our custom dataset, and relu activation function is used to get an accuracy of 91.90% and the recommended system confirms the effectiveness of the system. The proposed output is compared with HCNN, ResNet-LSTM, Bi-LSTM, and GLCM-ANN, and evidenced the effectiveness of the recommended system.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Acoustics (Basel, Switzerland)

CiteScore

3.70

自引率

0.00%

发文量

审稿时长

11 weeks