{"title":"使用 CRNN 和混合特征,基于扬声器识别 Ethio-Semitic 语言。","authors":"Malefia Demilie Melese, Amlakie Aschale Alemu, Ayodeji Olalekan Salau, Ibrahim Gashaw Kasa","doi":"10.1080/0954898X.2024.2359610","DOIUrl":null,"url":null,"abstract":"<p><p>Natural language is frequently employed for information exchange between humans and computers in modern digital environments. Natural Language Processing (NLP) is a basic requirement for technological advancement in the field of speech recognition. For additional NLP activities like speech-to-text translation, speech-to-speech translation, speaker recognition, and speech information retrieval, language identification (LID) is a prerequisite. In this paper, we developed a Language Identification (LID) model for Ethio-Semitic languages. We used a hybrid approach (a convolutional recurrent neural network (CRNN)), in addition to a mixed (Mel frequency cepstral coefficient (MFCC) and mel-spectrogram) approach, to build our LID model. The study focused on four Ethio-Semitic languages: Amharic, Ge'ez, Guragigna, and Tigrinya. By using data augmentation for the selected languages, we were able to expand our original dataset of 8 h of audio data to 24 h and 40 min. The proposed selected features, when evaluated, achieved an average performance accuracy of 98.1%, 98.6%, and 99.9% for testing, validation, and training, respectively. The results show that the CRNN model with (Mel-Spectrogram + MFCC) combination feature achieved the best results when compared to other existing models.</p>","PeriodicalId":54735,"journal":{"name":"Network-Computation in Neural Systems","volume":" ","pages":"1-23"},"PeriodicalIF":1.1000,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Speaker-based language identification for Ethio-Semitic languages using CRNN and hybrid features.\",\"authors\":\"Malefia Demilie Melese, Amlakie Aschale Alemu, Ayodeji Olalekan Salau, Ibrahim Gashaw Kasa\",\"doi\":\"10.1080/0954898X.2024.2359610\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Natural language is frequently employed for information exchange between humans and computers in modern digital environments. Natural Language Processing (NLP) is a basic requirement for technological advancement in the field of speech recognition. For additional NLP activities like speech-to-text translation, speech-to-speech translation, speaker recognition, and speech information retrieval, language identification (LID) is a prerequisite. In this paper, we developed a Language Identification (LID) model for Ethio-Semitic languages. We used a hybrid approach (a convolutional recurrent neural network (CRNN)), in addition to a mixed (Mel frequency cepstral coefficient (MFCC) and mel-spectrogram) approach, to build our LID model. The study focused on four Ethio-Semitic languages: Amharic, Ge'ez, Guragigna, and Tigrinya. By using data augmentation for the selected languages, we were able to expand our original dataset of 8 h of audio data to 24 h and 40 min. The proposed selected features, when evaluated, achieved an average performance accuracy of 98.1%, 98.6%, and 99.9% for testing, validation, and training, respectively. The results show that the CRNN model with (Mel-Spectrogram + MFCC) combination feature achieved the best results when compared to other existing models.</p>\",\"PeriodicalId\":54735,\"journal\":{\"name\":\"Network-Computation in Neural Systems\",\"volume\":\" \",\"pages\":\"1-23\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2024-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Network-Computation in Neural Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1080/0954898X.2024.2359610\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Network-Computation in Neural Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1080/0954898X.2024.2359610","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Speaker-based language identification for Ethio-Semitic languages using CRNN and hybrid features.
Natural language is frequently employed for information exchange between humans and computers in modern digital environments. Natural Language Processing (NLP) is a basic requirement for technological advancement in the field of speech recognition. For additional NLP activities like speech-to-text translation, speech-to-speech translation, speaker recognition, and speech information retrieval, language identification (LID) is a prerequisite. In this paper, we developed a Language Identification (LID) model for Ethio-Semitic languages. We used a hybrid approach (a convolutional recurrent neural network (CRNN)), in addition to a mixed (Mel frequency cepstral coefficient (MFCC) and mel-spectrogram) approach, to build our LID model. The study focused on four Ethio-Semitic languages: Amharic, Ge'ez, Guragigna, and Tigrinya. By using data augmentation for the selected languages, we were able to expand our original dataset of 8 h of audio data to 24 h and 40 min. The proposed selected features, when evaluated, achieved an average performance accuracy of 98.1%, 98.6%, and 99.9% for testing, validation, and training, respectively. The results show that the CRNN model with (Mel-Spectrogram + MFCC) combination feature achieved the best results when compared to other existing models.
期刊介绍:
Network: Computation in Neural Systems welcomes submissions of research papers that integrate theoretical neuroscience with experimental data, emphasizing the utilization of cutting-edge technologies. We invite authors and researchers to contribute their work in the following areas:
Theoretical Neuroscience: This section encompasses neural network modeling approaches that elucidate brain function.
Neural Networks in Data Analysis and Pattern Recognition: We encourage submissions exploring the use of neural networks for data analysis and pattern recognition, including but not limited to image analysis and speech processing applications.
Neural Networks in Control Systems: This category encompasses the utilization of neural networks in control systems, including robotics, state estimation, fault detection, and diagnosis.
Analysis of Neurophysiological Data: We invite submissions focusing on the analysis of neurophysiology data obtained from experimental studies involving animals.
Analysis of Experimental Data on the Human Brain: This section includes papers analyzing experimental data from studies on the human brain, utilizing imaging techniques such as MRI, fMRI, EEG, and PET.
Neurobiological Foundations of Consciousness: We encourage submissions exploring the neural bases of consciousness in the brain and its simulation in machines.