Voice pathology detection using convolutional neural networks with electroglottographic (EGG) and speech signals

Computer methods and programs in biomedicine update Pub Date : 2022-01-01 DOI:10.1016/j.cmpbup.2022.100074

Rumana Islam , Esam Abdel-Raheem , Mohammed Tarique

{"title":"Voice pathology detection using convolutional neural networks with electroglottographic (EGG) and speech signals","authors":"Rumana Islam , Esam Abdel-Raheem , Mohammed Tarique","doi":"10.1016/j.cmpbup.2022.100074","DOIUrl":null,"url":null,"abstract":"<div><p>This paper presents a convolutional neural network (CNN) based automated noninvasive voice pathology detection system. The proposed system functions in two steps. First, it discriminates pathological voices from healthy ones, and then, it classifies the discriminated pathological voices into one of the three pathologies. Two CNNs are used for these purposes; one works as a binary classifier to identify pathological voices. The other one works as a multiclass classifier for categorizing the voice pathologies. This work investigates the effectiveness of electroglottographic (EGG) and speech signals to detect and classify pathological voices using sustained vowel ('/a/') samples. EGG signals can assess the vibratory pattern of the vocal folds during voiced sound. On the other hand, the speech signals add spectral color to the EGG signals. Hence, their contributions for pathology identification and segregation differ, as demonstrated in this work. The Saarbrücken Voice Database (SVD) is used in this investigation. The results show that the proposed system achieves a higher accuracy (more than 9%) in identifying pathological voices from healthy ones with speech signals than EGG signals. However, categorizing pathological voices into different pathology types demonstrates higher accuracy (more than 12%) with EGG signals than speech signals. A comparative performance analysis of the proposed system is presented with these two signals in terms of clinical and statistical measures. The obtained results of this work are also compared with those of other related published works.</p></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"2 ","pages":"Article 100074"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666990022000258/pdfft?md5=8eea3c31d7c8f756c52783bf420ea51b&pid=1-s2.0-S2666990022000258-main.pdf","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine update","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666990022000258","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

This paper presents a convolutional neural network (CNN) based automated noninvasive voice pathology detection system. The proposed system functions in two steps. First, it discriminates pathological voices from healthy ones, and then, it classifies the discriminated pathological voices into one of the three pathologies. Two CNNs are used for these purposes; one works as a binary classifier to identify pathological voices. The other one works as a multiclass classifier for categorizing the voice pathologies. This work investigates the effectiveness of electroglottographic (EGG) and speech signals to detect and classify pathological voices using sustained vowel ('/a/') samples. EGG signals can assess the vibratory pattern of the vocal folds during voiced sound. On the other hand, the speech signals add spectral color to the EGG signals. Hence, their contributions for pathology identification and segregation differ, as demonstrated in this work. The Saarbrücken Voice Database (SVD) is used in this investigation. The results show that the proposed system achieves a higher accuracy (more than 9%) in identifying pathological voices from healthy ones with speech signals than EGG signals. However, categorizing pathological voices into different pathology types demonstrates higher accuracy (more than 12%) with EGG signals than speech signals. A comparative performance analysis of the proposed system is presented with these two signals in terms of clinical and statistical measures. The obtained results of this work are also compared with those of other related published works.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于声门电信号和语音信号的卷积神经网络语音病理检测

提出了一种基于卷积神经网络(CNN)的无创语音病理自动检测系统。该系统分两步运行。首先将病理性的声音与健康的声音区分开来，然后将区分出来的病理性声音分为三种病理之一。两个cnn被用于这些目的;一种是作为二元分类器来识别病态的声音。另一个作为多类分类器对语音病理进行分类。本研究探讨了电声门图(EGG)和语音信号在使用持续元音('/a/')样本检测和分类病理声音方面的有效性。EGG信号可以评估发声时声带的振动模式。另一方面，语音信号为EGG信号添加了光谱色彩。因此，他们对病理鉴定和分离的贡献不同，正如在这项工作中所证明的那样。本次调查使用了saarbr cken语音数据库(SVD)。结果表明，与EGG信号相比，基于语音信号的病理语音识别准确率更高(9%以上)。然而，与语音信号相比，EGG信号将病理语音分类为不同的病理类型的准确率更高(超过12%)。比较性能分析提出的系统与这两个信号在临床和统计措施。并将所得结果与其他已发表的相关文献进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊