{"title":"滤波器组结构对通过倒谱系数捕获异常信息的意义","authors":"Laxmi Priya Sahu, G. Pradhan","doi":"10.1109/SPCOM55316.2022.9840837","DOIUrl":null,"url":null,"abstract":"The short-term Fourier transform magnitude spectra (STFT-MS) computed from the dysarthric speech deviates nonlinearly from the normal speech in different frequency bands depending on underlying sound units. This discriminating information can be captured by segmenting the STFT-MS into different frequency bands following the power spectra of board categories of sound units. Motivated by this observation in this study, we have computed the cepstral coefficients by analyzing the STFT-MS in 0–500 Hz, 500–2000 Hz, 2000–4000 Hz, and 4000 – 8000Hz, respectively for 16 kHz sampled speech data. Each of the selected frequency bands is analyzed by using a 30 channel Mel filterbank. The log filterbank energies computed for each sub-band are then polled together and discrete cosine transform (DCT) is applied to compute the cepstral coefficients, here termed as sub-band enhanced Mel frequency cepstral coefficients (SE-MFCC). The i-vector based dysarthric intelligibility assessment system reported in this study shows that the SEMFCC outperforms the conventional Mel frequency cepstral coefficients (MFCC), and the cepstral coefficients computed using inverse-Mel filterbank (IMFCC), and linear filterbank (LFCC). The score level combination of SE-MFCC with the MFCC further improves the overall performance.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Significance of Filterbank Structure for Capturing Dysarthric Information through Cepstral Coefficients\",\"authors\":\"Laxmi Priya Sahu, G. Pradhan\",\"doi\":\"10.1109/SPCOM55316.2022.9840837\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The short-term Fourier transform magnitude spectra (STFT-MS) computed from the dysarthric speech deviates nonlinearly from the normal speech in different frequency bands depending on underlying sound units. This discriminating information can be captured by segmenting the STFT-MS into different frequency bands following the power spectra of board categories of sound units. Motivated by this observation in this study, we have computed the cepstral coefficients by analyzing the STFT-MS in 0–500 Hz, 500–2000 Hz, 2000–4000 Hz, and 4000 – 8000Hz, respectively for 16 kHz sampled speech data. Each of the selected frequency bands is analyzed by using a 30 channel Mel filterbank. The log filterbank energies computed for each sub-band are then polled together and discrete cosine transform (DCT) is applied to compute the cepstral coefficients, here termed as sub-band enhanced Mel frequency cepstral coefficients (SE-MFCC). The i-vector based dysarthric intelligibility assessment system reported in this study shows that the SEMFCC outperforms the conventional Mel frequency cepstral coefficients (MFCC), and the cepstral coefficients computed using inverse-Mel filterbank (IMFCC), and linear filterbank (LFCC). The score level combination of SE-MFCC with the MFCC further improves the overall performance.\",\"PeriodicalId\":246982,\"journal\":{\"name\":\"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPCOM55316.2022.9840837\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPCOM55316.2022.9840837","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Significance of Filterbank Structure for Capturing Dysarthric Information through Cepstral Coefficients
The short-term Fourier transform magnitude spectra (STFT-MS) computed from the dysarthric speech deviates nonlinearly from the normal speech in different frequency bands depending on underlying sound units. This discriminating information can be captured by segmenting the STFT-MS into different frequency bands following the power spectra of board categories of sound units. Motivated by this observation in this study, we have computed the cepstral coefficients by analyzing the STFT-MS in 0–500 Hz, 500–2000 Hz, 2000–4000 Hz, and 4000 – 8000Hz, respectively for 16 kHz sampled speech data. Each of the selected frequency bands is analyzed by using a 30 channel Mel filterbank. The log filterbank energies computed for each sub-band are then polled together and discrete cosine transform (DCT) is applied to compute the cepstral coefficients, here termed as sub-band enhanced Mel frequency cepstral coefficients (SE-MFCC). The i-vector based dysarthric intelligibility assessment system reported in this study shows that the SEMFCC outperforms the conventional Mel frequency cepstral coefficients (MFCC), and the cepstral coefficients computed using inverse-Mel filterbank (IMFCC), and linear filterbank (LFCC). The score level combination of SE-MFCC with the MFCC further improves the overall performance.