Md. Tahsin Tausif, S. A. Chowdhury, Md. Shiplu Hawlader, Mohammed Hasanuzzaman, Hasnain Heickal
{"title":"基于深度学习的孟加拉语语音到文本转换","authors":"Md. Tahsin Tausif, S. A. Chowdhury, Md. Shiplu Hawlader, Mohammed Hasanuzzaman, Hasnain Heickal","doi":"10.1109/CSII.2018.00016","DOIUrl":null,"url":null,"abstract":"Speech-To-Text conversion is the process of recognizing speech in audio and producing a text transcript for it. Due to speech being such an intuitive medium of communication, this technology can have far reaching effects in easing the interaction between humans and machine. This paper presents a complete speech-to-text conversion system for the Bangla language (also known as Bengali) using Deep Recurrent Neural Networks. Possible optimization such as Broken Language Format has been proposed which is based on properties of the Bangla Language for reducing the training time of the network. A simple deep recurrent neural network architecture has been used for speech recognition. It was trained with collected data and which yielded over 95% accuracy in case of training data and 50% accuracy in case of testing data.","PeriodicalId":202365,"journal":{"name":"2018 5th International Conference on Computational Science/ Intelligence and Applied Informatics (CSII)","volume":"01 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Deep Learning Based Bangla Speech-to-Text Conversion\",\"authors\":\"Md. Tahsin Tausif, S. A. Chowdhury, Md. Shiplu Hawlader, Mohammed Hasanuzzaman, Hasnain Heickal\",\"doi\":\"10.1109/CSII.2018.00016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech-To-Text conversion is the process of recognizing speech in audio and producing a text transcript for it. Due to speech being such an intuitive medium of communication, this technology can have far reaching effects in easing the interaction between humans and machine. This paper presents a complete speech-to-text conversion system for the Bangla language (also known as Bengali) using Deep Recurrent Neural Networks. Possible optimization such as Broken Language Format has been proposed which is based on properties of the Bangla Language for reducing the training time of the network. A simple deep recurrent neural network architecture has been used for speech recognition. It was trained with collected data and which yielded over 95% accuracy in case of training data and 50% accuracy in case of testing data.\",\"PeriodicalId\":202365,\"journal\":{\"name\":\"2018 5th International Conference on Computational Science/ Intelligence and Applied Informatics (CSII)\",\"volume\":\"01 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 5th International Conference on Computational Science/ Intelligence and Applied Informatics (CSII)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSII.2018.00016\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 5th International Conference on Computational Science/ Intelligence and Applied Informatics (CSII)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSII.2018.00016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Deep Learning Based Bangla Speech-to-Text Conversion
Speech-To-Text conversion is the process of recognizing speech in audio and producing a text transcript for it. Due to speech being such an intuitive medium of communication, this technology can have far reaching effects in easing the interaction between humans and machine. This paper presents a complete speech-to-text conversion system for the Bangla language (also known as Bengali) using Deep Recurrent Neural Networks. Possible optimization such as Broken Language Format has been proposed which is based on properties of the Bangla Language for reducing the training time of the network. A simple deep recurrent neural network architecture has been used for speech recognition. It was trained with collected data and which yielded over 95% accuracy in case of training data and 50% accuracy in case of testing data.