{"title":"多语言识别智能系统","authors":"F. Ramo, Mohammed Kannah","doi":"10.33899/edusj.2022.132223.1200","DOIUrl":null,"url":null,"abstract":"Language classification systems are used to classify spoken language from a particular phoneme sample and are usually the first step of many spoken language processing tasks, such as automatic speech recognition (ASR) systems Without automatic language detection, spoken speech cannot be properly analyzed and grammar rules cannot be applied, causing failures Subsequent speech recognition steps. We propose a language classification system that solves the problem in the image field, rather than the sound field. This research identified and implemented several low-level features using Mel Frequency Cepstral Coefficients, which extract traits from speech files of four languages (Arabic, English, French, Kurdish) from the database (M2L_Dataset) as the data source used in this research. A Convolutional Neuron Network is used to operate on spectrogram images of the available audio snippets. In extensive experiments, we showed that our model is applicable to a range of noisy scenarios and can easily be extended to previously unknown languages, while maintaining classification accuracy. We released our own code and extensive training package for language classification systems for the community. CNN algorithm was applied in this research to classify and the result was perfect, as the classification accuracy reached 97% between two languages if the sample length was only one second, but if the sample length was two seconds, the classification accuracy reached 98%. While the classification among three languages, the classification accuracy reached 95% if the sample length was only one second, but if the sample length was two seconds, the classification accuracy reached 96%.","PeriodicalId":33491,"journal":{"name":"mjl@ ltrby@ wl`lm","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Intelligence System for Multi-Language Recognition\",\"authors\":\"F. Ramo, Mohammed Kannah\",\"doi\":\"10.33899/edusj.2022.132223.1200\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Language classification systems are used to classify spoken language from a particular phoneme sample and are usually the first step of many spoken language processing tasks, such as automatic speech recognition (ASR) systems Without automatic language detection, spoken speech cannot be properly analyzed and grammar rules cannot be applied, causing failures Subsequent speech recognition steps. We propose a language classification system that solves the problem in the image field, rather than the sound field. This research identified and implemented several low-level features using Mel Frequency Cepstral Coefficients, which extract traits from speech files of four languages (Arabic, English, French, Kurdish) from the database (M2L_Dataset) as the data source used in this research. A Convolutional Neuron Network is used to operate on spectrogram images of the available audio snippets. In extensive experiments, we showed that our model is applicable to a range of noisy scenarios and can easily be extended to previously unknown languages, while maintaining classification accuracy. We released our own code and extensive training package for language classification systems for the community. CNN algorithm was applied in this research to classify and the result was perfect, as the classification accuracy reached 97% between two languages if the sample length was only one second, but if the sample length was two seconds, the classification accuracy reached 98%. While the classification among three languages, the classification accuracy reached 95% if the sample length was only one second, but if the sample length was two seconds, the classification accuracy reached 96%.\",\"PeriodicalId\":33491,\"journal\":{\"name\":\"mjl@ ltrby@ wl`lm\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"mjl@ ltrby@ wl`lm\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.33899/edusj.2022.132223.1200\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"mjl@ ltrby@ wl`lm","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33899/edusj.2022.132223.1200","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
语言分类系统用于从特定音素样本中对口语进行分类,通常是许多口语处理任务的第一步,如自动语音识别(ASR)系统,如果没有自动语言检测,口语就无法正确分析,语法规则也无法应用,导致后续语音识别步骤失败。我们提出了一个语言分类系统,解决了图像领域的问题,而不是声音领域的问题。本研究使用Mel Frequency Cepstral系数从数据库(M2L_Dataset)中的四种语言(阿拉伯语、英语、法语、库尔德语)的语音文件中提取特征,作为本研究使用的数据源,识别并实现了几个低级特征。使用卷积神经元网络对可用音频片段的频谱图图像进行操作。在大量的实验中,我们证明了我们的模型适用于一系列有噪声的场景,并且可以很容易地扩展到以前未知的语言,同时保持分类准确性。我们为社区发布了自己的代码和广泛的语言分类系统培训包。本研究使用CNN算法进行分类,结果非常理想,当样本长度为1秒时,两种语言之间的分类准确率达到97%,而当样本长度为2秒时,分类准确率达到98%。而三种语言之间的分类,当样本长度为1秒时,分类准确率达到95%,而当样本长度为2秒时,分类准确率达到96%。
Intelligence System for Multi-Language Recognition
Language classification systems are used to classify spoken language from a particular phoneme sample and are usually the first step of many spoken language processing tasks, such as automatic speech recognition (ASR) systems Without automatic language detection, spoken speech cannot be properly analyzed and grammar rules cannot be applied, causing failures Subsequent speech recognition steps. We propose a language classification system that solves the problem in the image field, rather than the sound field. This research identified and implemented several low-level features using Mel Frequency Cepstral Coefficients, which extract traits from speech files of four languages (Arabic, English, French, Kurdish) from the database (M2L_Dataset) as the data source used in this research. A Convolutional Neuron Network is used to operate on spectrogram images of the available audio snippets. In extensive experiments, we showed that our model is applicable to a range of noisy scenarios and can easily be extended to previously unknown languages, while maintaining classification accuracy. We released our own code and extensive training package for language classification systems for the community. CNN algorithm was applied in this research to classify and the result was perfect, as the classification accuracy reached 97% between two languages if the sample length was only one second, but if the sample length was two seconds, the classification accuracy reached 98%. While the classification among three languages, the classification accuracy reached 95% if the sample length was only one second, but if the sample length was two seconds, the classification accuracy reached 96%.