基于线性预测误差方差的难发音语音信号的改进静-静音-浊音分割

2020 5th International Conference on Computer and Communication Systems (ICCCS) Pub Date : 2020-05-01 DOI:10.1109/ICCCS49078.2020.9118462

T. Ijitona, Hong Yue, J. Soraghan, A. Lowit

{"title":"基于线性预测误差方差的难发音语音信号的改进静-静音-浊音分割","authors":"T. Ijitona, Hong Yue, J. Soraghan, A. Lowit","doi":"10.1109/ICCCS49078.2020.9118462","DOIUrl":null,"url":null,"abstract":"A novel algorithm for the segmentation of dysarthric speech into silence, unvoiced and voiced (SUV) segments is presented. The proposed algorithm is based on the combination of short-time energy (STE), zero-crossing rate (ZCR) and linear prediction error variance (LPEV) or the segmentation problem. Extending the previous work in this field, the proposed method will address the difficulties in distinguishing between voiced and unvoiced segments in dysarthric speech. More precisely, the error variance of the linear prediction coefficients will be used to design a three-fold decision matrix that can accommodate the high variability in loudness experienced in dysarthric speech. In addition, a moving average threshold approach will be proposed in order to provide an “as-fit” segmentation technique that is fully automated and that will be able to handle highly severe dysarthric speech with varying loudness and ZCRs. The ability of the proposed fully-automated algorithm will be validated using real speech samples from healthy speakers, and speakers with ataxic dysarthria. The results of the proposed approach are compared with known methods using STE and ZCR. It is observed that the proposed classification method does not only show an improvement in segmentation performance but also provides consistent results in low signal energy situations.","PeriodicalId":105556,"journal":{"name":"2020 5th International Conference on Computer and Communication Systems (ICCCS)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Improved Silence-Unvoiced-Voiced (SUV) Segmentation for Dysarthric Speech Signals using Linear Prediction Error Variance\",\"authors\":\"T. Ijitona, Hong Yue, J. Soraghan, A. Lowit\",\"doi\":\"10.1109/ICCCS49078.2020.9118462\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A novel algorithm for the segmentation of dysarthric speech into silence, unvoiced and voiced (SUV) segments is presented. The proposed algorithm is based on the combination of short-time energy (STE), zero-crossing rate (ZCR) and linear prediction error variance (LPEV) or the segmentation problem. Extending the previous work in this field, the proposed method will address the difficulties in distinguishing between voiced and unvoiced segments in dysarthric speech. More precisely, the error variance of the linear prediction coefficients will be used to design a three-fold decision matrix that can accommodate the high variability in loudness experienced in dysarthric speech. In addition, a moving average threshold approach will be proposed in order to provide an “as-fit” segmentation technique that is fully automated and that will be able to handle highly severe dysarthric speech with varying loudness and ZCRs. The ability of the proposed fully-automated algorithm will be validated using real speech samples from healthy speakers, and speakers with ataxic dysarthria. The results of the proposed approach are compared with known methods using STE and ZCR. It is observed that the proposed classification method does not only show an improvement in segmentation performance but also provides consistent results in low signal energy situations.\",\"PeriodicalId\":105556,\"journal\":{\"name\":\"2020 5th International Conference on Computer and Communication Systems (ICCCS)\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 5th International Conference on Computer and Communication Systems (ICCCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCS49078.2020.9118462\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Computer and Communication Systems (ICCCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCS49078.2020.9118462","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

提出了一种将困难语音分割为静、清、浊音(SUV)段的新算法。该算法基于短时能量(STE)、过零率(ZCR)和线性预测误差方差(LPEV)的组合或分割问题。在此领域的基础上，提出的方法将解决发音困难语音中浊音段和不浊音段的区分困难。更准确地说，线性预测系数的误差方差将用于设计一个三重决策矩阵，该矩阵可以适应困难语音中响度的高可变性。此外，将提出一种移动平均阈值方法，以提供一种完全自动化的“as-fit”分割技术，该技术将能够处理具有不同响度和zcr的高度严重的困难语音。所提出的全自动算法的能力将通过健康说话者和患有共济失调构音障碍的说话者的真实语音样本进行验证。将该方法的结果与已知的STE和ZCR方法进行了比较。结果表明，本文提出的分类方法不仅在分割性能上有所提高，而且在低信号能量情况下也能提供一致的分割结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Improved Silence-Unvoiced-Voiced (SUV) Segmentation for Dysarthric Speech Signals using Linear Prediction Error Variance

A novel algorithm for the segmentation of dysarthric speech into silence, unvoiced and voiced (SUV) segments is presented. The proposed algorithm is based on the combination of short-time energy (STE), zero-crossing rate (ZCR) and linear prediction error variance (LPEV) or the segmentation problem. Extending the previous work in this field, the proposed method will address the difficulties in distinguishing between voiced and unvoiced segments in dysarthric speech. More precisely, the error variance of the linear prediction coefficients will be used to design a three-fold decision matrix that can accommodate the high variability in loudness experienced in dysarthric speech. In addition, a moving average threshold approach will be proposed in order to provide an “as-fit” segmentation technique that is fully automated and that will be able to handle highly severe dysarthric speech with varying loudness and ZCRs. The ability of the proposed fully-automated algorithm will be validated using real speech samples from healthy speakers, and speakers with ataxic dysarthria. The results of the proposed approach are compared with known methods using STE and ZCR. It is observed that the proposed classification method does not only show an improvement in segmentation performance but also provides consistent results in low signal energy situations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 5th International Conference on Computer and Communication Systems (ICCCS)

自引率

0.00%

发文量