{"title":"使用深度学习方法和 GWO 算法优化四元数 meixner 矩进行自动读唇分类","authors":"","doi":"10.1016/j.knosys.2024.112430","DOIUrl":null,"url":null,"abstract":"<div><p>Lip-reading classification has received a lot of interest in recent decades because it is widely used in a variety of fields. It plays an important role in interpreting spoken words in noisy situations and reconstructing communication processes for those with hearing impairments. Despite significant advancements in this field, there are still several drawbacks in existing work such as feature extraction and Model capability for visual speech recognition. For these reasons, the current paper suggests an Optimized Quaternion Meixner Moments Convolutional Neural Network (OQMMs-CNN) method that intends to develop a Visual Speech Recognition (VSR) system based only on video images. This unique method combines OQMMs optimized for the GWO algorithm and convolutional neural networks taken from deep learning techniques with the aim of recognizing digits, words, or letters displayed as input videos.The OQMMs are used here as descriptors with the purpose of identifying, holding, and extracting essential information from video images (lips images) and generating moments for CNN input. The latter uses Meixner polynomials, which are defined by local parameters α and β. Then, the Grey Wolf optimization method (GWO) is applied to enssure excellent classification accuracy by optimizing those local parameters. After being tested on three public datasets such as AVLetters, Grid, AVDigits, and LRW, and comparing to several ways using complicated models and deep architecture, the method emerges as an excellent solution for reducing the high dimensionality of video pictures and training time.</p></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automatic lip-reading classification using deep learning approaches and optimized quaternion meixner moments by GWO algorithm\",\"authors\":\"\",\"doi\":\"10.1016/j.knosys.2024.112430\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Lip-reading classification has received a lot of interest in recent decades because it is widely used in a variety of fields. It plays an important role in interpreting spoken words in noisy situations and reconstructing communication processes for those with hearing impairments. Despite significant advancements in this field, there are still several drawbacks in existing work such as feature extraction and Model capability for visual speech recognition. For these reasons, the current paper suggests an Optimized Quaternion Meixner Moments Convolutional Neural Network (OQMMs-CNN) method that intends to develop a Visual Speech Recognition (VSR) system based only on video images. This unique method combines OQMMs optimized for the GWO algorithm and convolutional neural networks taken from deep learning techniques with the aim of recognizing digits, words, or letters displayed as input videos.The OQMMs are used here as descriptors with the purpose of identifying, holding, and extracting essential information from video images (lips images) and generating moments for CNN input. The latter uses Meixner polynomials, which are defined by local parameters α and β. Then, the Grey Wolf optimization method (GWO) is applied to enssure excellent classification accuracy by optimizing those local parameters. After being tested on three public datasets such as AVLetters, Grid, AVDigits, and LRW, and comparing to several ways using complicated models and deep architecture, the method emerges as an excellent solution for reducing the high dimensionality of video pictures and training time.</p></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2024-09-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705124010645\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705124010645","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Automatic lip-reading classification using deep learning approaches and optimized quaternion meixner moments by GWO algorithm
Lip-reading classification has received a lot of interest in recent decades because it is widely used in a variety of fields. It plays an important role in interpreting spoken words in noisy situations and reconstructing communication processes for those with hearing impairments. Despite significant advancements in this field, there are still several drawbacks in existing work such as feature extraction and Model capability for visual speech recognition. For these reasons, the current paper suggests an Optimized Quaternion Meixner Moments Convolutional Neural Network (OQMMs-CNN) method that intends to develop a Visual Speech Recognition (VSR) system based only on video images. This unique method combines OQMMs optimized for the GWO algorithm and convolutional neural networks taken from deep learning techniques with the aim of recognizing digits, words, or letters displayed as input videos.The OQMMs are used here as descriptors with the purpose of identifying, holding, and extracting essential information from video images (lips images) and generating moments for CNN input. The latter uses Meixner polynomials, which are defined by local parameters α and β. Then, the Grey Wolf optimization method (GWO) is applied to enssure excellent classification accuracy by optimizing those local parameters. After being tested on three public datasets such as AVLetters, Grid, AVDigits, and LRW, and comparing to several ways using complicated models and deep architecture, the method emerges as an excellent solution for reducing the high dimensionality of video pictures and training time.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.