DenseRecognition of Spoken Languages

2020 25th International Conference on Pattern Recognition (ICPR) Pub Date : 2021-01-10 DOI:10.1109/ICPR48806.2021.9412413

Jaybrata Chakraborty, Bappaditya Chakraborty, U. Bhattacharya

{"title":"DenseRecognition of Spoken Languages","authors":"Jaybrata Chakraborty, Bappaditya Chakraborty, U. Bhattacharya","doi":"10.1109/ICPR48806.2021.9412413","DOIUrl":null,"url":null,"abstract":"In the present study, we have considered a large number (27) of Indian languages for recognition from their speech signals of different sources. A dense convolutional network architecture (DenseNet) has been used for this classification task. Dynamic elimination of low energy frames from the input speech signal has been considered as a preprocessing operation. Mel-spectrogram of pre-processed speech signal is fed as input to the DenseNet architecture. Language recognition performance of this architecture has been compared with that of several state-of-the-art deep architectures which include a convolutional neural network (CNN), ResNet, CNN-BLSTM and DenseNet-BLSTM hybrid architectures. Additionally, we obtained recognition performances of a stacked BLSTM architecture fed with different sets of handcrafted features for comparison purpose. Simulations for both speaker independent and speaker dependent scenarios have been performed on two different standard datasets which include (i) IITKGP-MLILSC dataset of news clips in 27 different Indian languages and (ii) Linguistic Data Consortium (LDC) dataset of telephonic conversations in 5 different Indian languages. In each case, recognition performance of the DenseNet architecture along with Mel-spectrogram features has been found to be significantly better than all other frameworks implemented in this study.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"78 1","pages":"9674-9681"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 25th International Conference on Pattern Recognition (ICPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPR48806.2021.9412413","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

In the present study, we have considered a large number (27) of Indian languages for recognition from their speech signals of different sources. A dense convolutional network architecture (DenseNet) has been used for this classification task. Dynamic elimination of low energy frames from the input speech signal has been considered as a preprocessing operation. Mel-spectrogram of pre-processed speech signal is fed as input to the DenseNet architecture. Language recognition performance of this architecture has been compared with that of several state-of-the-art deep architectures which include a convolutional neural network (CNN), ResNet, CNN-BLSTM and DenseNet-BLSTM hybrid architectures. Additionally, we obtained recognition performances of a stacked BLSTM architecture fed with different sets of handcrafted features for comparison purpose. Simulations for both speaker independent and speaker dependent scenarios have been performed on two different standard datasets which include (i) IITKGP-MLILSC dataset of news clips in 27 different Indian languages and (ii) Linguistic Data Consortium (LDC) dataset of telephonic conversations in 5 different Indian languages. In each case, recognition performance of the DenseNet architecture along with Mel-spectrogram features has been found to be significantly better than all other frameworks implemented in this study.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

口语的密集识别

在本研究中，我们考虑了大量(27)种印度语言从不同来源的语音信号中进行识别。一个密集卷积网络架构(DenseNet)被用于这个分类任务。从输入语音信号中动态消除低能量帧被认为是一种预处理操作。将预处理语音信号的mel谱图作为DenseNet体系结构的输入。将该体系结构的语言识别性能与卷积神经网络(CNN)、ResNet、CNN- blstm和DenseNet-BLSTM混合体系结构等几种最先进的深度体系结构进行了比较。此外，为了进行比较，我们获得了由不同组手工特征馈送的堆叠BLSTM体系结构的识别性能。在两种不同的标准数据集上进行了演讲者独立和演讲者依赖场景的模拟，其中包括(i) 27种不同印度语言的IITKGP-MLILSC新闻片段数据集和(ii)语言数据联盟(LDC) 5种不同印度语言的电话对话数据集。在每种情况下，DenseNet架构以及mel谱图特征的识别性能都明显优于本研究中实现的所有其他框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 25th International Conference on Pattern Recognition (ICPR)

自引率

0.00%

发文量

期刊最新文献

Trajectory representation learning for Multi-Task NMRDP planning Semantic Segmentation Refinement Using Entropy and Boundary-guided Monte Carlo Sampling and Directed Regional Search A Randomized Algorithm for Sparse Recovery An Empirical Bayes Approach to Topic Modeling To Honor our Heroes: Analysis of the Obituaries of Australians Killed in Action in WWI and WWII