基于NVGRAM-WNN的视觉语音识别系统的FPGA实现

Wisam H. Ali, Thamir R. Saeed, Mahmuod H. Al-Muifraje
{"title":"基于NVGRAM-WNN的视觉语音识别系统的FPGA实现","authors":"Wisam H. Ali, Thamir R. Saeed, Mahmuod H. Al-Muifraje","doi":"10.1109/CSASE48920.2020.9142095","DOIUrl":null,"url":null,"abstract":"Visual identification is an exciting field because it reflects the primary form of understanding of objects used by humans. At the beginning of artificial intelligent technology, multiple experiments were suggested by the researcher to develop a computer image recogniser similar to human recognition. One such application is in the speech recognition system in a noisy environment, where the visual cue representing the movement of the lips contains some essential information added to the audio signal, as well as how the person merges audio-visual stimuli to identify output words. A little, but unresolved, part of this problem is the classification of utterance using only the visual signals without the speaker’s acoustic signal being available. Taking into account a collection of frames from a recorded video for a person speaking a word; a robust image processing technique is adopted to isolate the region of the lips; then extracting correct geometric characteristics that reflect the variation of the mouth shape during the speech. The observed features are utilised by the identification stage to identify the spoken word. This paper aims to solve this problem by introducing a new segmentation technique to isolate the area of the lips together with a collection of visual shape features centred on the boundary of the extracted lips that can read the lips with significant results. Weightless neural network classifier is proposed to enhance the utterance identification with hardware implementation based on FPGA. Furthermore, A specialised laboratory is designed to collect the utterance of twenty-six English letters from thirty speakers who are adopted in this paper.","PeriodicalId":254581,"journal":{"name":"2020 International Conference on Computer Science and Software Engineering (CSASE)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"FPGA Implementation of Visual Speech Recognition System based on NVGRAM-WNN\",\"authors\":\"Wisam H. Ali, Thamir R. Saeed, Mahmuod H. Al-Muifraje\",\"doi\":\"10.1109/CSASE48920.2020.9142095\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Visual identification is an exciting field because it reflects the primary form of understanding of objects used by humans. At the beginning of artificial intelligent technology, multiple experiments were suggested by the researcher to develop a computer image recogniser similar to human recognition. One such application is in the speech recognition system in a noisy environment, where the visual cue representing the movement of the lips contains some essential information added to the audio signal, as well as how the person merges audio-visual stimuli to identify output words. A little, but unresolved, part of this problem is the classification of utterance using only the visual signals without the speaker’s acoustic signal being available. Taking into account a collection of frames from a recorded video for a person speaking a word; a robust image processing technique is adopted to isolate the region of the lips; then extracting correct geometric characteristics that reflect the variation of the mouth shape during the speech. The observed features are utilised by the identification stage to identify the spoken word. This paper aims to solve this problem by introducing a new segmentation technique to isolate the area of the lips together with a collection of visual shape features centred on the boundary of the extracted lips that can read the lips with significant results. Weightless neural network classifier is proposed to enhance the utterance identification with hardware implementation based on FPGA. Furthermore, A specialised laboratory is designed to collect the utterance of twenty-six English letters from thirty speakers who are adopted in this paper.\",\"PeriodicalId\":254581,\"journal\":{\"name\":\"2020 International Conference on Computer Science and Software Engineering (CSASE)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Computer Science and Software Engineering (CSASE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSASE48920.2020.9142095\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Computer Science and Software Engineering (CSASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSASE48920.2020.9142095","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

视觉识别是一个令人兴奋的领域,因为它反映了人类对物体的主要理解形式。在人工智能技术之初,研究人员建议通过多次实验来开发一种类似于人类识别的计算机图像识别器。一个这样的应用是在嘈杂环境中的语音识别系统中,其中代表嘴唇运动的视觉线索包含一些添加到音频信号中的基本信息,以及人如何合并视听刺激来识别输出单词。这个问题的一部分是只使用视觉信号而不使用说话人的声音信号来对话语进行分类,这是一个很小但尚未解决的问题。考虑从录制的视频中收集的一个人说一个词的帧;采用鲁棒图像处理技术对嘴唇区域进行分离;然后提取正确的几何特征,反映说话过程中嘴型的变化。识别阶段利用观察到的特征来识别口语单词。本文旨在通过引入一种新的分割技术来解决这一问题,该技术将嘴唇的区域与以提取的嘴唇边界为中心的视觉形状特征集合隔离开来,可以读取嘴唇并取得显著的结果。为了增强语音识别能力,提出了一种基于FPGA的无权重神经网络分类器。此外,我们还设计了一个专门的实验室来收集本文所采用的30位说话者的26个英文字母的发音。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
FPGA Implementation of Visual Speech Recognition System based on NVGRAM-WNN
Visual identification is an exciting field because it reflects the primary form of understanding of objects used by humans. At the beginning of artificial intelligent technology, multiple experiments were suggested by the researcher to develop a computer image recogniser similar to human recognition. One such application is in the speech recognition system in a noisy environment, where the visual cue representing the movement of the lips contains some essential information added to the audio signal, as well as how the person merges audio-visual stimuli to identify output words. A little, but unresolved, part of this problem is the classification of utterance using only the visual signals without the speaker’s acoustic signal being available. Taking into account a collection of frames from a recorded video for a person speaking a word; a robust image processing technique is adopted to isolate the region of the lips; then extracting correct geometric characteristics that reflect the variation of the mouth shape during the speech. The observed features are utilised by the identification stage to identify the spoken word. This paper aims to solve this problem by introducing a new segmentation technique to isolate the area of the lips together with a collection of visual shape features centred on the boundary of the extracted lips that can read the lips with significant results. Weightless neural network classifier is proposed to enhance the utterance identification with hardware implementation based on FPGA. Furthermore, A specialised laboratory is designed to collect the utterance of twenty-six English letters from thirty speakers who are adopted in this paper.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
IoT Based Water Tank Level Control System Using PLC Performance Evaluation of Dual Polarization Coherent Detection Optical for Next Generation of UWOC Systems An Automated Vertebrate Animals Classification Using Deep Convolution Neural Networks CSASE 2020 Keynote Speakers-1 A Secure Mechanism to Prevent ARP Spoofing and ARP Broadcasting in SDN
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1