Wisam H. Ali, Thamir R. Saeed, Mahmuod H. Al-Muifraje
{"title":"基于NVGRAM-WNN的视觉语音识别系统的FPGA实现","authors":"Wisam H. Ali, Thamir R. Saeed, Mahmuod H. Al-Muifraje","doi":"10.1109/CSASE48920.2020.9142095","DOIUrl":null,"url":null,"abstract":"Visual identification is an exciting field because it reflects the primary form of understanding of objects used by humans. At the beginning of artificial intelligent technology, multiple experiments were suggested by the researcher to develop a computer image recogniser similar to human recognition. One such application is in the speech recognition system in a noisy environment, where the visual cue representing the movement of the lips contains some essential information added to the audio signal, as well as how the person merges audio-visual stimuli to identify output words. A little, but unresolved, part of this problem is the classification of utterance using only the visual signals without the speaker’s acoustic signal being available. Taking into account a collection of frames from a recorded video for a person speaking a word; a robust image processing technique is adopted to isolate the region of the lips; then extracting correct geometric characteristics that reflect the variation of the mouth shape during the speech. The observed features are utilised by the identification stage to identify the spoken word. This paper aims to solve this problem by introducing a new segmentation technique to isolate the area of the lips together with a collection of visual shape features centred on the boundary of the extracted lips that can read the lips with significant results. Weightless neural network classifier is proposed to enhance the utterance identification with hardware implementation based on FPGA. Furthermore, A specialised laboratory is designed to collect the utterance of twenty-six English letters from thirty speakers who are adopted in this paper.","PeriodicalId":254581,"journal":{"name":"2020 International Conference on Computer Science and Software Engineering (CSASE)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"FPGA Implementation of Visual Speech Recognition System based on NVGRAM-WNN\",\"authors\":\"Wisam H. Ali, Thamir R. Saeed, Mahmuod H. Al-Muifraje\",\"doi\":\"10.1109/CSASE48920.2020.9142095\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Visual identification is an exciting field because it reflects the primary form of understanding of objects used by humans. At the beginning of artificial intelligent technology, multiple experiments were suggested by the researcher to develop a computer image recogniser similar to human recognition. One such application is in the speech recognition system in a noisy environment, where the visual cue representing the movement of the lips contains some essential information added to the audio signal, as well as how the person merges audio-visual stimuli to identify output words. A little, but unresolved, part of this problem is the classification of utterance using only the visual signals without the speaker’s acoustic signal being available. Taking into account a collection of frames from a recorded video for a person speaking a word; a robust image processing technique is adopted to isolate the region of the lips; then extracting correct geometric characteristics that reflect the variation of the mouth shape during the speech. The observed features are utilised by the identification stage to identify the spoken word. This paper aims to solve this problem by introducing a new segmentation technique to isolate the area of the lips together with a collection of visual shape features centred on the boundary of the extracted lips that can read the lips with significant results. Weightless neural network classifier is proposed to enhance the utterance identification with hardware implementation based on FPGA. Furthermore, A specialised laboratory is designed to collect the utterance of twenty-six English letters from thirty speakers who are adopted in this paper.\",\"PeriodicalId\":254581,\"journal\":{\"name\":\"2020 International Conference on Computer Science and Software Engineering (CSASE)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Computer Science and Software Engineering (CSASE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSASE48920.2020.9142095\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Computer Science and Software Engineering (CSASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSASE48920.2020.9142095","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
FPGA Implementation of Visual Speech Recognition System based on NVGRAM-WNN
Visual identification is an exciting field because it reflects the primary form of understanding of objects used by humans. At the beginning of artificial intelligent technology, multiple experiments were suggested by the researcher to develop a computer image recogniser similar to human recognition. One such application is in the speech recognition system in a noisy environment, where the visual cue representing the movement of the lips contains some essential information added to the audio signal, as well as how the person merges audio-visual stimuli to identify output words. A little, but unresolved, part of this problem is the classification of utterance using only the visual signals without the speaker’s acoustic signal being available. Taking into account a collection of frames from a recorded video for a person speaking a word; a robust image processing technique is adopted to isolate the region of the lips; then extracting correct geometric characteristics that reflect the variation of the mouth shape during the speech. The observed features are utilised by the identification stage to identify the spoken word. This paper aims to solve this problem by introducing a new segmentation technique to isolate the area of the lips together with a collection of visual shape features centred on the boundary of the extracted lips that can read the lips with significant results. Weightless neural network classifier is proposed to enhance the utterance identification with hardware implementation based on FPGA. Furthermore, A specialised laboratory is designed to collect the utterance of twenty-six English letters from thirty speakers who are adopted in this paper.