M. Jeeva, T. Nagarajan, Vijayalakshmi Parthasarathy
{"title":"离散余弦变换派生的基于频谱的时域多带滤波语音增强算法","authors":"M. Jeeva, T. Nagarajan, Vijayalakshmi Parthasarathy","doi":"10.1049/iet-spr.2016.0125","DOIUrl":null,"url":null,"abstract":"Conventional multiband speech enhancement involves splitting the spectrum into various frequency bins and performing speech enhancement in each band independently. However, owing to the pole-interaction problem in the spectral domain, estimation of clean speech from the formants, suppressed by the influence of the formants in the neighbouring bands, may result in poor quality. To reduce the influence of stronger formants over the neighbouring bands, in the current work, clean speech is estimated by filtering unprocessed speech in the temporal domain into various equivalent rectangular bandwidth based subbands followed by discrete cosine transform (DCT) based spectral speech enhancement in each band using spectral subtraction/minimum mean square error (MMSE). To further enhance speech, a spectral subtraction-based approach that incorporates band-specific weighting factor obtained using respective band signal-to-noise ratio (SNR), and an MMSE estimator that calculates apriori speech presence/absence probability based on local and global apriori SNR rather than a fixed/equiprobable value are proposed. The performance of the algorithms is evaluated using perceptual evaluation of speech quality and composite speech quality measure. It is observed that DCT-derived spectrum based temporal-domain multiband speech enhancement algorithm outperforms the existing techniques for car, babble, train, white, and factory noise in the 0-10 dB SNR levels.","PeriodicalId":272888,"journal":{"name":"IET Signal Process.","volume":"112 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Discrete cosine transform-derived spectrum-based speech enhancement algorithm using temporal-domain multiband filtering\",\"authors\":\"M. Jeeva, T. Nagarajan, Vijayalakshmi Parthasarathy\",\"doi\":\"10.1049/iet-spr.2016.0125\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Conventional multiband speech enhancement involves splitting the spectrum into various frequency bins and performing speech enhancement in each band independently. However, owing to the pole-interaction problem in the spectral domain, estimation of clean speech from the formants, suppressed by the influence of the formants in the neighbouring bands, may result in poor quality. To reduce the influence of stronger formants over the neighbouring bands, in the current work, clean speech is estimated by filtering unprocessed speech in the temporal domain into various equivalent rectangular bandwidth based subbands followed by discrete cosine transform (DCT) based spectral speech enhancement in each band using spectral subtraction/minimum mean square error (MMSE). To further enhance speech, a spectral subtraction-based approach that incorporates band-specific weighting factor obtained using respective band signal-to-noise ratio (SNR), and an MMSE estimator that calculates apriori speech presence/absence probability based on local and global apriori SNR rather than a fixed/equiprobable value are proposed. The performance of the algorithms is evaluated using perceptual evaluation of speech quality and composite speech quality measure. It is observed that DCT-derived spectrum based temporal-domain multiband speech enhancement algorithm outperforms the existing techniques for car, babble, train, white, and factory noise in the 0-10 dB SNR levels.\",\"PeriodicalId\":272888,\"journal\":{\"name\":\"IET Signal Process.\",\"volume\":\"112 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IET Signal Process.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1049/iet-spr.2016.0125\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Signal Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1049/iet-spr.2016.0125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Conventional multiband speech enhancement involves splitting the spectrum into various frequency bins and performing speech enhancement in each band independently. However, owing to the pole-interaction problem in the spectral domain, estimation of clean speech from the formants, suppressed by the influence of the formants in the neighbouring bands, may result in poor quality. To reduce the influence of stronger formants over the neighbouring bands, in the current work, clean speech is estimated by filtering unprocessed speech in the temporal domain into various equivalent rectangular bandwidth based subbands followed by discrete cosine transform (DCT) based spectral speech enhancement in each band using spectral subtraction/minimum mean square error (MMSE). To further enhance speech, a spectral subtraction-based approach that incorporates band-specific weighting factor obtained using respective band signal-to-noise ratio (SNR), and an MMSE estimator that calculates apriori speech presence/absence probability based on local and global apriori SNR rather than a fixed/equiprobable value are proposed. The performance of the algorithms is evaluated using perceptual evaluation of speech quality and composite speech quality measure. It is observed that DCT-derived spectrum based temporal-domain multiband speech enhancement algorithm outperforms the existing techniques for car, babble, train, white, and factory noise in the 0-10 dB SNR levels.