基于双谱分析和概率神经网络的噪声环境下说话人识别

Proceedings Fourth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2001 Pub Date : 2001-10-30 DOI:10.1109/ICCIMA.2001.970480

B. Kusumoputro, A. Triyanto, M. I. Fanany, W. Jatmiko

{"title":"基于双谱分析和概率神经网络的噪声环境下说话人识别","authors":"B. Kusumoputro, A. Triyanto, M. I. Fanany, W. Jatmiko","doi":"10.1109/ICCIMA.2001.970480","DOIUrl":null,"url":null,"abstract":"The paper describes the application of a neural processing for extracting bispectrum feature of speech data, and the use of probabilistic neural network as a classifier in an automatic speech recognition system. The usually used feature extraction paradigm in the early development of the speech recognition system is power spectrum analysis, however, the recognition rate of this system is not high enough, especially when a Gaussian noise is added to the utterance speech data. In this paper, we developed a speaker identification system using bispectrum feature analysis. To analyse the distribution of the bispectrum data along its two dimensional representation, we developed an adaptive feature extraction mechanism of the bispectrum speech data based on cascade neural network. A cascade configuration of SOFM (Self-Organizing Feature Map) and LVQ (Learning Vector Quantization) is used as an adaptive codebook generation algorithm for determining the feature distribution of the bispectrum speech data. The K-L transformation (K-LT) technique is then used as a preprocessing element before the neural classifier is utilized. This K-LT has shown as an effective procedure for orthogonalization and dimensionality reduction of the codebook vectors generated from bispectrum data. Experimental results show that our system could perform with high recognition rate on the undirected utterance speech, especially when a higher number of codebook vectors are utilized. It is also shown that the use of PNN could increase the recognition rate significantly, even using speech data with additional Gaussian noise.","PeriodicalId":232504,"journal":{"name":"Proceedings Fourth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2001","volume":"104 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Speaker identification in noisy environment using bispectrum analysis and probabilistic neural network\",\"authors\":\"B. Kusumoputro, A. Triyanto, M. I. Fanany, W. Jatmiko\",\"doi\":\"10.1109/ICCIMA.2001.970480\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The paper describes the application of a neural processing for extracting bispectrum feature of speech data, and the use of probabilistic neural network as a classifier in an automatic speech recognition system. The usually used feature extraction paradigm in the early development of the speech recognition system is power spectrum analysis, however, the recognition rate of this system is not high enough, especially when a Gaussian noise is added to the utterance speech data. In this paper, we developed a speaker identification system using bispectrum feature analysis. To analyse the distribution of the bispectrum data along its two dimensional representation, we developed an adaptive feature extraction mechanism of the bispectrum speech data based on cascade neural network. A cascade configuration of SOFM (Self-Organizing Feature Map) and LVQ (Learning Vector Quantization) is used as an adaptive codebook generation algorithm for determining the feature distribution of the bispectrum speech data. The K-L transformation (K-LT) technique is then used as a preprocessing element before the neural classifier is utilized. This K-LT has shown as an effective procedure for orthogonalization and dimensionality reduction of the codebook vectors generated from bispectrum data. Experimental results show that our system could perform with high recognition rate on the undirected utterance speech, especially when a higher number of codebook vectors are utilized. It is also shown that the use of PNN could increase the recognition rate significantly, even using speech data with additional Gaussian noise.\",\"PeriodicalId\":232504,\"journal\":{\"name\":\"Proceedings Fourth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2001\",\"volume\":\"104 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings Fourth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2001\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCIMA.2001.970480\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Fourth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2001","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIMA.2001.970480","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

本文介绍了一种神经处理方法在语音数据双谱特征提取中的应用，以及概率神经网络作为分类器在语音自动识别系统中的应用。在语音识别系统的早期开发中，通常使用的特征提取范式是功率谱分析，但是该系统的识别率不够高，特别是当在话语语音数据中加入高斯噪声时。本文开发了一种基于双谱特征分析的说话人识别系统。为了分析语音数据的二维分布，提出了一种基于级联神经网络的双谱语音数据自适应特征提取机制。采用SOFM(自组织特征映射)和LVQ(学习向量量化)的级联配置作为自适应码本生成算法来确定双谱语音数据的特征分布。在使用神经分类器之前，使用K-L变换(K-LT)技术作为预处理元素。这种K-LT已被证明是一种有效的程序，用于正交和降维由双谱数据产生的码本向量。实验结果表明，该系统对无向语音具有较高的识别率，特别是当使用较多的码本向量时。研究还表明，即使使用附加高斯噪声的语音数据，使用PNN也能显著提高识别率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Speaker identification in noisy environment using bispectrum analysis and probabilistic neural network

The paper describes the application of a neural processing for extracting bispectrum feature of speech data, and the use of probabilistic neural network as a classifier in an automatic speech recognition system. The usually used feature extraction paradigm in the early development of the speech recognition system is power spectrum analysis, however, the recognition rate of this system is not high enough, especially when a Gaussian noise is added to the utterance speech data. In this paper, we developed a speaker identification system using bispectrum feature analysis. To analyse the distribution of the bispectrum data along its two dimensional representation, we developed an adaptive feature extraction mechanism of the bispectrum speech data based on cascade neural network. A cascade configuration of SOFM (Self-Organizing Feature Map) and LVQ (Learning Vector Quantization) is used as an adaptive codebook generation algorithm for determining the feature distribution of the bispectrum speech data. The K-L transformation (K-LT) technique is then used as a preprocessing element before the neural classifier is utilized. This K-LT has shown as an effective procedure for orthogonalization and dimensionality reduction of the codebook vectors generated from bispectrum data. Experimental results show that our system could perform with high recognition rate on the undirected utterance speech, especially when a higher number of codebook vectors are utilized. It is also shown that the use of PNN could increase the recognition rate significantly, even using speech data with additional Gaussian noise.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings Fourth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2001

自引率

0.00%

发文量

期刊最新文献

Acquisition of stair like structure by gift Data visualization tools for 3SAT instances An intelligent tutoring system for teaching and learning Hoare logic Consideration to computer generated force for defence systems Design and implementation of MPEG-4 authoring tool