噪声语音识别的频谱-时间接受野和MFCC平衡特征提取

Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific Pub Date : 2014-12-01 DOI:10.1109/APSIPA.2014.7041624

Jia-Ching Wang, Chang-Hong Lin, En-Ting Chen, P. Chang

{"title":"噪声语音识别的频谱-时间接受野和MFCC平衡特征提取","authors":"Jia-Ching Wang, Chang-Hong Lin, En-Ting Chen, P. Chang","doi":"10.1109/APSIPA.2014.7041624","DOIUrl":null,"url":null,"abstract":"This paper aims to propose a new set of acoustic features based on spectral-temporal receptive fields (STRFs). The STRF is an analysis method for studying physiological model of the mammalian auditory system in spectral-temporal domain. It has two different parts: one is the rate (in Hz) which represents the temporal response and the other is the scale (in cycle/octave) which represents the spectral response. With the obtained STRF, we propose an effective acoustic feature. First, the energy of each scale is calculated from the STRF. The logarithmic operation is then imposed on the scale energies. Finally, the discrete Cosine transform is applied to generate the proposed STRF feature. In our experiments, we combine the proposed STRF feature with conventional Mel frequency cepstral coefficients (MFCCs) to verify its effectiveness. In a noise-free environment, the proposed feature can increase the recognition rate by 17.48%. Moreover, the increase in the recognition rate ranges from 5% to 12% in noisy environments.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Spectral-temporal receptive fields and MFCC balanced feature extraction for noisy speech recognition\",\"authors\":\"Jia-Ching Wang, Chang-Hong Lin, En-Ting Chen, P. Chang\",\"doi\":\"10.1109/APSIPA.2014.7041624\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper aims to propose a new set of acoustic features based on spectral-temporal receptive fields (STRFs). The STRF is an analysis method for studying physiological model of the mammalian auditory system in spectral-temporal domain. It has two different parts: one is the rate (in Hz) which represents the temporal response and the other is the scale (in cycle/octave) which represents the spectral response. With the obtained STRF, we propose an effective acoustic feature. First, the energy of each scale is calculated from the STRF. The logarithmic operation is then imposed on the scale energies. Finally, the discrete Cosine transform is applied to generate the proposed STRF feature. In our experiments, we combine the proposed STRF feature with conventional Mel frequency cepstral coefficients (MFCCs) to verify its effectiveness. In a noise-free environment, the proposed feature can increase the recognition rate by 17.48%. Moreover, the increase in the recognition rate ranges from 5% to 12% in noisy environments.\",\"PeriodicalId\":231382,\"journal\":{\"name\":\"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APSIPA.2014.7041624\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIPA.2014.7041624","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

本文旨在提出一套新的基于频谱-时间接受场(strf)的声学特征。STRF是研究哺乳动物听觉系统生理模型的一种频谱-时域分析方法。它有两个不同的部分:一个是表示时间响应的速率(以赫兹为单位)，另一个是表示频谱响应的尺度(以周期/倍频)。利用得到的STRF，我们提出了一个有效的声学特征。首先，从STRF中计算出各个尺度的能量。然后对刻度能量进行对数运算。最后，应用离散余弦变换生成所提出的STRF特征。在我们的实验中，我们将提出的STRF特征与传统的Mel频率倒谱系数(MFCCs)相结合来验证其有效性。在无噪声环境下，该特征可将识别率提高17.48%。在噪声环境下，识别率的提高幅度在5% ~ 12%之间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Spectral-temporal receptive fields and MFCC balanced feature extraction for noisy speech recognition

This paper aims to propose a new set of acoustic features based on spectral-temporal receptive fields (STRFs). The STRF is an analysis method for studying physiological model of the mammalian auditory system in spectral-temporal domain. It has two different parts: one is the rate (in Hz) which represents the temporal response and the other is the scale (in cycle/octave) which represents the spectral response. With the obtained STRF, we propose an effective acoustic feature. First, the energy of each scale is calculated from the STRF. The logarithmic operation is then imposed on the scale energies. Finally, the discrete Cosine transform is applied to generate the proposed STRF feature. In our experiments, we combine the proposed STRF feature with conventional Mel frequency cepstral coefficients (MFCCs) to verify its effectiveness. In a noise-free environment, the proposed feature can increase the recognition rate by 17.48%. Moreover, the increase in the recognition rate ranges from 5% to 12% in noisy environments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific

自引率

0.00%

发文量

期刊最新文献

Smoothing of spatial filter by graph Fourier transform for EEG signals Intra line copy for HEVC screen content coding Design of FPGA-based rapid prototype spectral subtraction for hands-free speech applications Fetal ECG extraction using adaptive functional link artificial neural network Opened Pins Recommendation System to promote tourism sector in Chiang Rai Thailand