{"title":"基于线性预测误差方差的难发音语音信号的改进静-静音-浊音分割","authors":"T. Ijitona, Hong Yue, J. Soraghan, A. Lowit","doi":"10.1109/ICCCS49078.2020.9118462","DOIUrl":null,"url":null,"abstract":"A novel algorithm for the segmentation of dysarthric speech into silence, unvoiced and voiced (SUV) segments is presented. The proposed algorithm is based on the combination of short-time energy (STE), zero-crossing rate (ZCR) and linear prediction error variance (LPEV) or the segmentation problem. Extending the previous work in this field, the proposed method will address the difficulties in distinguishing between voiced and unvoiced segments in dysarthric speech. More precisely, the error variance of the linear prediction coefficients will be used to design a three-fold decision matrix that can accommodate the high variability in loudness experienced in dysarthric speech. In addition, a moving average threshold approach will be proposed in order to provide an “as-fit” segmentation technique that is fully automated and that will be able to handle highly severe dysarthric speech with varying loudness and ZCRs. The ability of the proposed fully-automated algorithm will be validated using real speech samples from healthy speakers, and speakers with ataxic dysarthria. The results of the proposed approach are compared with known methods using STE and ZCR. It is observed that the proposed classification method does not only show an improvement in segmentation performance but also provides consistent results in low signal energy situations.","PeriodicalId":105556,"journal":{"name":"2020 5th International Conference on Computer and Communication Systems (ICCCS)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Improved Silence-Unvoiced-Voiced (SUV) Segmentation for Dysarthric Speech Signals using Linear Prediction Error Variance\",\"authors\":\"T. Ijitona, Hong Yue, J. Soraghan, A. Lowit\",\"doi\":\"10.1109/ICCCS49078.2020.9118462\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A novel algorithm for the segmentation of dysarthric speech into silence, unvoiced and voiced (SUV) segments is presented. The proposed algorithm is based on the combination of short-time energy (STE), zero-crossing rate (ZCR) and linear prediction error variance (LPEV) or the segmentation problem. Extending the previous work in this field, the proposed method will address the difficulties in distinguishing between voiced and unvoiced segments in dysarthric speech. More precisely, the error variance of the linear prediction coefficients will be used to design a three-fold decision matrix that can accommodate the high variability in loudness experienced in dysarthric speech. In addition, a moving average threshold approach will be proposed in order to provide an “as-fit” segmentation technique that is fully automated and that will be able to handle highly severe dysarthric speech with varying loudness and ZCRs. The ability of the proposed fully-automated algorithm will be validated using real speech samples from healthy speakers, and speakers with ataxic dysarthria. The results of the proposed approach are compared with known methods using STE and ZCR. It is observed that the proposed classification method does not only show an improvement in segmentation performance but also provides consistent results in low signal energy situations.\",\"PeriodicalId\":105556,\"journal\":{\"name\":\"2020 5th International Conference on Computer and Communication Systems (ICCCS)\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 5th International Conference on Computer and Communication Systems (ICCCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCS49078.2020.9118462\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Computer and Communication Systems (ICCCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCS49078.2020.9118462","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improved Silence-Unvoiced-Voiced (SUV) Segmentation for Dysarthric Speech Signals using Linear Prediction Error Variance
A novel algorithm for the segmentation of dysarthric speech into silence, unvoiced and voiced (SUV) segments is presented. The proposed algorithm is based on the combination of short-time energy (STE), zero-crossing rate (ZCR) and linear prediction error variance (LPEV) or the segmentation problem. Extending the previous work in this field, the proposed method will address the difficulties in distinguishing between voiced and unvoiced segments in dysarthric speech. More precisely, the error variance of the linear prediction coefficients will be used to design a three-fold decision matrix that can accommodate the high variability in loudness experienced in dysarthric speech. In addition, a moving average threshold approach will be proposed in order to provide an “as-fit” segmentation technique that is fully automated and that will be able to handle highly severe dysarthric speech with varying loudness and ZCRs. The ability of the proposed fully-automated algorithm will be validated using real speech samples from healthy speakers, and speakers with ataxic dysarthria. The results of the proposed approach are compared with known methods using STE and ZCR. It is observed that the proposed classification method does not only show an improvement in segmentation performance but also provides consistent results in low signal energy situations.