Shehan Irteza Pranto, Rahad Arman Nabid, Ahnaf Mozib Samin, Nabeel Mohammed, F. Sarker, M. N. Huda, K. Mamun
{"title":"结合说话人识别和人工会话实体的孟加拉语医疗自动化人机交互","authors":"Shehan Irteza Pranto, Rahad Arman Nabid, Ahnaf Mozib Samin, Nabeel Mohammed, F. Sarker, M. N. Huda, K. Mamun","doi":"10.1109/ICEEE54059.2021.9718797","DOIUrl":null,"url":null,"abstract":"The research study presents an architecture of HumanRobot Interaction (HRI) based Artificial Conversational Entity integrated with speaker recognition ability to avail modern healthcare services. Due to the Covid-19 pandemic, the situation has become troublesome for health workers and patients to visit hospitals because of the high risk of virus dissemination. To minimize the mass congestion, our developed architecture would be an appropriate, cost-effective solution that automates the reception system by enabling AI-based HRI and providing fast and advanced healthcare services in the context of Bangladesh. The architecture consists of two significant subsections: Speaker Recognition and Artificial Conversational Entities having Automatic Speech Recognition in Bengali, Interactive Agent, and Text-to-Speech-synthesis. We used MFCC features as the linguistic parameters and the GMM statistical model to adapt each speaker’s voice and estimation and maximization algorithm to identify the speaker’s identity. The developed speaker recognition module performed significantly with 94.38% average accuracy in noisy environments and 96.27% average accuracy in studio quality environments and achieved a word error rate (WER) of 42.15% from RNN based Deep Speech 2 model for Bangla Automatic Speech Recognition (ASR). Besides, Artificial Conversational Entity performs with an average accuracy of 98.58% in a small-scale real-time environment.","PeriodicalId":188366,"journal":{"name":"2021 3rd International Conference on Electrical & Electronic Engineering (ICEEE)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Human-Robot Interaction in Bengali language for Healthcare Automation integrated with Speaker Recognition and Artificial Conversational Entity\",\"authors\":\"Shehan Irteza Pranto, Rahad Arman Nabid, Ahnaf Mozib Samin, Nabeel Mohammed, F. Sarker, M. N. Huda, K. Mamun\",\"doi\":\"10.1109/ICEEE54059.2021.9718797\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The research study presents an architecture of HumanRobot Interaction (HRI) based Artificial Conversational Entity integrated with speaker recognition ability to avail modern healthcare services. Due to the Covid-19 pandemic, the situation has become troublesome for health workers and patients to visit hospitals because of the high risk of virus dissemination. To minimize the mass congestion, our developed architecture would be an appropriate, cost-effective solution that automates the reception system by enabling AI-based HRI and providing fast and advanced healthcare services in the context of Bangladesh. The architecture consists of two significant subsections: Speaker Recognition and Artificial Conversational Entities having Automatic Speech Recognition in Bengali, Interactive Agent, and Text-to-Speech-synthesis. We used MFCC features as the linguistic parameters and the GMM statistical model to adapt each speaker’s voice and estimation and maximization algorithm to identify the speaker’s identity. The developed speaker recognition module performed significantly with 94.38% average accuracy in noisy environments and 96.27% average accuracy in studio quality environments and achieved a word error rate (WER) of 42.15% from RNN based Deep Speech 2 model for Bangla Automatic Speech Recognition (ASR). Besides, Artificial Conversational Entity performs with an average accuracy of 98.58% in a small-scale real-time environment.\",\"PeriodicalId\":188366,\"journal\":{\"name\":\"2021 3rd International Conference on Electrical & Electronic Engineering (ICEEE)\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 3rd International Conference on Electrical & Electronic Engineering (ICEEE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICEEE54059.2021.9718797\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 3rd International Conference on Electrical & Electronic Engineering (ICEEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEEE54059.2021.9718797","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Human-Robot Interaction in Bengali language for Healthcare Automation integrated with Speaker Recognition and Artificial Conversational Entity
The research study presents an architecture of HumanRobot Interaction (HRI) based Artificial Conversational Entity integrated with speaker recognition ability to avail modern healthcare services. Due to the Covid-19 pandemic, the situation has become troublesome for health workers and patients to visit hospitals because of the high risk of virus dissemination. To minimize the mass congestion, our developed architecture would be an appropriate, cost-effective solution that automates the reception system by enabling AI-based HRI and providing fast and advanced healthcare services in the context of Bangladesh. The architecture consists of two significant subsections: Speaker Recognition and Artificial Conversational Entities having Automatic Speech Recognition in Bengali, Interactive Agent, and Text-to-Speech-synthesis. We used MFCC features as the linguistic parameters and the GMM statistical model to adapt each speaker’s voice and estimation and maximization algorithm to identify the speaker’s identity. The developed speaker recognition module performed significantly with 94.38% average accuracy in noisy environments and 96.27% average accuracy in studio quality environments and achieved a word error rate (WER) of 42.15% from RNN based Deep Speech 2 model for Bangla Automatic Speech Recognition (ASR). Besides, Artificial Conversational Entity performs with an average accuracy of 98.58% in a small-scale real-time environment.