滤波器组结构对通过倒谱系数捕获异常信息的意义

Laxmi Priya Sahu, G. Pradhan
{"title":"滤波器组结构对通过倒谱系数捕获异常信息的意义","authors":"Laxmi Priya Sahu, G. Pradhan","doi":"10.1109/SPCOM55316.2022.9840837","DOIUrl":null,"url":null,"abstract":"The short-term Fourier transform magnitude spectra (STFT-MS) computed from the dysarthric speech deviates nonlinearly from the normal speech in different frequency bands depending on underlying sound units. This discriminating information can be captured by segmenting the STFT-MS into different frequency bands following the power spectra of board categories of sound units. Motivated by this observation in this study, we have computed the cepstral coefficients by analyzing the STFT-MS in 0–500 Hz, 500–2000 Hz, 2000–4000 Hz, and 4000 – 8000Hz, respectively for 16 kHz sampled speech data. Each of the selected frequency bands is analyzed by using a 30 channel Mel filterbank. The log filterbank energies computed for each sub-band are then polled together and discrete cosine transform (DCT) is applied to compute the cepstral coefficients, here termed as sub-band enhanced Mel frequency cepstral coefficients (SE-MFCC). The i-vector based dysarthric intelligibility assessment system reported in this study shows that the SEMFCC outperforms the conventional Mel frequency cepstral coefficients (MFCC), and the cepstral coefficients computed using inverse-Mel filterbank (IMFCC), and linear filterbank (LFCC). The score level combination of SE-MFCC with the MFCC further improves the overall performance.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Significance of Filterbank Structure for Capturing Dysarthric Information through Cepstral Coefficients\",\"authors\":\"Laxmi Priya Sahu, G. Pradhan\",\"doi\":\"10.1109/SPCOM55316.2022.9840837\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The short-term Fourier transform magnitude spectra (STFT-MS) computed from the dysarthric speech deviates nonlinearly from the normal speech in different frequency bands depending on underlying sound units. This discriminating information can be captured by segmenting the STFT-MS into different frequency bands following the power spectra of board categories of sound units. Motivated by this observation in this study, we have computed the cepstral coefficients by analyzing the STFT-MS in 0–500 Hz, 500–2000 Hz, 2000–4000 Hz, and 4000 – 8000Hz, respectively for 16 kHz sampled speech data. Each of the selected frequency bands is analyzed by using a 30 channel Mel filterbank. The log filterbank energies computed for each sub-band are then polled together and discrete cosine transform (DCT) is applied to compute the cepstral coefficients, here termed as sub-band enhanced Mel frequency cepstral coefficients (SE-MFCC). The i-vector based dysarthric intelligibility assessment system reported in this study shows that the SEMFCC outperforms the conventional Mel frequency cepstral coefficients (MFCC), and the cepstral coefficients computed using inverse-Mel filterbank (IMFCC), and linear filterbank (LFCC). The score level combination of SE-MFCC with the MFCC further improves the overall performance.\",\"PeriodicalId\":246982,\"journal\":{\"name\":\"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPCOM55316.2022.9840837\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPCOM55316.2022.9840837","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

短期傅里叶变换幅度谱(STFT-MS)计算从困难的语音在不同的频带非线性偏离正常的语音取决于底层的声音单位。这种鉴别信息可以通过将STFT-MS分割成不同的频带来捕获,这些频带是根据板类声音单元的功率谱划分的。基于这一观察结果,我们通过分析16 kHz采样语音数据在0-500 Hz、500-2000 Hz、2000-4000 Hz和4000 - 8000Hz的STFT-MS分别计算了倒谱系数。使用30通道Mel滤波器组对每个选定的频段进行分析。然后对每个子带计算的对数滤波器组能量进行轮询,并应用离散余弦变换(DCT)计算倒谱系数,这里称为子带增强Mel频率倒谱系数(SE-MFCC)。本研究中基于i向量的困难理解度评估系统表明,SEMFCC优于传统的Mel频率倒谱系数(MFCC),以及使用逆Mel滤波器组(IMFCC)和线性滤波器组(LFCC)计算的倒谱系数。SE-MFCC与MFCC的评分水平结合进一步提高了整体表现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Significance of Filterbank Structure for Capturing Dysarthric Information through Cepstral Coefficients
The short-term Fourier transform magnitude spectra (STFT-MS) computed from the dysarthric speech deviates nonlinearly from the normal speech in different frequency bands depending on underlying sound units. This discriminating information can be captured by segmenting the STFT-MS into different frequency bands following the power spectra of board categories of sound units. Motivated by this observation in this study, we have computed the cepstral coefficients by analyzing the STFT-MS in 0–500 Hz, 500–2000 Hz, 2000–4000 Hz, and 4000 – 8000Hz, respectively for 16 kHz sampled speech data. Each of the selected frequency bands is analyzed by using a 30 channel Mel filterbank. The log filterbank energies computed for each sub-band are then polled together and discrete cosine transform (DCT) is applied to compute the cepstral coefficients, here termed as sub-band enhanced Mel frequency cepstral coefficients (SE-MFCC). The i-vector based dysarthric intelligibility assessment system reported in this study shows that the SEMFCC outperforms the conventional Mel frequency cepstral coefficients (MFCC), and the cepstral coefficients computed using inverse-Mel filterbank (IMFCC), and linear filterbank (LFCC). The score level combination of SE-MFCC with the MFCC further improves the overall performance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
C-Band Iris Coupled Cavity Bandpass Filter A Wideband Bandpass Filter using U-shaped slots on SIW with two Notches at 8 GHz and 10 GHz Semi-Blind Technique for Frequency Selective Channel Estimation in Millimeter-Wave MIMO Coded FBMC System Binary Intelligent Reflecting Surfaces Assisted OFDM Systems Improving the Performance of Zero-Resource Children’s ASR System through Formant and Duration Modification based Data Augmentation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1