{"title":"基于CNN-Bigru-Attention模型的音频信号谱图多类语言识别","authors":"Ma Xueli, Mijit Ablimit, A. Hamdulla","doi":"10.1109/PRML52754.2021.9520702","DOIUrl":null,"url":null,"abstract":"Aiming at the problems of low recognition rate and uneven distribution of language information in language identification tasks, a language recognition method based on the CNN-Bigru-Attention model is proposed. This method first extracts the spectrogram of audio signals and converts it into a gray-scale spectrogram as input, then uses CNN (convolutional neural network) to capture the local features, and extracts the temporal features through the Bigru (Bidirectional gated recurrent unit), and then local features and temporal features are passed to the attention mechanism layer to focus on the information related to the language features and suppress useless information. Finally the classes of language is output through the fully connected layer. Experiments on the Common voice dataset show that the method has achieved good results and improves the performance of language identification.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Multiclass Language Identification Using CNN-Bigru-Attention Model on Spectrogram of Audio Signals\",\"authors\":\"Ma Xueli, Mijit Ablimit, A. Hamdulla\",\"doi\":\"10.1109/PRML52754.2021.9520702\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Aiming at the problems of low recognition rate and uneven distribution of language information in language identification tasks, a language recognition method based on the CNN-Bigru-Attention model is proposed. This method first extracts the spectrogram of audio signals and converts it into a gray-scale spectrogram as input, then uses CNN (convolutional neural network) to capture the local features, and extracts the temporal features through the Bigru (Bidirectional gated recurrent unit), and then local features and temporal features are passed to the attention mechanism layer to focus on the information related to the language features and suppress useless information. Finally the classes of language is output through the fully connected layer. Experiments on the Common voice dataset show that the method has achieved good results and improves the performance of language identification.\",\"PeriodicalId\":429603,\"journal\":{\"name\":\"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)\",\"volume\":\"91 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PRML52754.2021.9520702\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PRML52754.2021.9520702","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multiclass Language Identification Using CNN-Bigru-Attention Model on Spectrogram of Audio Signals
Aiming at the problems of low recognition rate and uneven distribution of language information in language identification tasks, a language recognition method based on the CNN-Bigru-Attention model is proposed. This method first extracts the spectrogram of audio signals and converts it into a gray-scale spectrogram as input, then uses CNN (convolutional neural network) to capture the local features, and extracts the temporal features through the Bigru (Bidirectional gated recurrent unit), and then local features and temporal features are passed to the attention mechanism layer to focus on the information related to the language features and suppress useless information. Finally the classes of language is output through the fully connected layer. Experiments on the Common voice dataset show that the method has achieved good results and improves the performance of language identification.