Voice pathology detection using convolutional neural networks with electroglottographic (EGG) and speech signals

Rumana Islam , Esam Abdel-Raheem , Mohammed Tarique
{"title":"Voice pathology detection using convolutional neural networks with electroglottographic (EGG) and speech signals","authors":"Rumana Islam ,&nbsp;Esam Abdel-Raheem ,&nbsp;Mohammed Tarique","doi":"10.1016/j.cmpbup.2022.100074","DOIUrl":null,"url":null,"abstract":"<div><p>This paper presents a convolutional neural network (CNN) based automated noninvasive voice pathology detection system. The proposed system functions in two steps. First, it discriminates pathological voices from healthy ones, and then, it classifies the discriminated pathological voices into one of the three pathologies. Two CNNs are used for these purposes; one works as a binary classifier to identify pathological voices. The other one works as a multiclass classifier for categorizing the voice pathologies. This work investigates the effectiveness of electroglottographic (EGG) and speech signals to detect and classify pathological voices using sustained vowel ('/a/') samples. EGG signals can assess the vibratory pattern of the vocal folds during voiced sound. On the other hand, the speech signals add spectral color to the EGG signals. Hence, their contributions for pathology identification and segregation differ, as demonstrated in this work. The Saarbrücken Voice Database (SVD) is used in this investigation. The results show that the proposed system achieves a higher accuracy (more than 9%) in identifying pathological voices from healthy ones with speech signals than EGG signals. However, categorizing pathological voices into different pathology types demonstrates higher accuracy (more than 12%) with EGG signals than speech signals. A comparative performance analysis of the proposed system is presented with these two signals in terms of clinical and statistical measures. The obtained results of this work are also compared with those of other related published works.</p></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"2 ","pages":"Article 100074"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666990022000258/pdfft?md5=8eea3c31d7c8f756c52783bf420ea51b&pid=1-s2.0-S2666990022000258-main.pdf","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine update","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666990022000258","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

This paper presents a convolutional neural network (CNN) based automated noninvasive voice pathology detection system. The proposed system functions in two steps. First, it discriminates pathological voices from healthy ones, and then, it classifies the discriminated pathological voices into one of the three pathologies. Two CNNs are used for these purposes; one works as a binary classifier to identify pathological voices. The other one works as a multiclass classifier for categorizing the voice pathologies. This work investigates the effectiveness of electroglottographic (EGG) and speech signals to detect and classify pathological voices using sustained vowel ('/a/') samples. EGG signals can assess the vibratory pattern of the vocal folds during voiced sound. On the other hand, the speech signals add spectral color to the EGG signals. Hence, their contributions for pathology identification and segregation differ, as demonstrated in this work. The Saarbrücken Voice Database (SVD) is used in this investigation. The results show that the proposed system achieves a higher accuracy (more than 9%) in identifying pathological voices from healthy ones with speech signals than EGG signals. However, categorizing pathological voices into different pathology types demonstrates higher accuracy (more than 12%) with EGG signals than speech signals. A comparative performance analysis of the proposed system is presented with these two signals in terms of clinical and statistical measures. The obtained results of this work are also compared with those of other related published works.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于声门电信号和语音信号的卷积神经网络语音病理检测
提出了一种基于卷积神经网络(CNN)的无创语音病理自动检测系统。该系统分两步运行。首先将病理性的声音与健康的声音区分开来,然后将区分出来的病理性声音分为三种病理之一。两个cnn被用于这些目的;一种是作为二元分类器来识别病态的声音。另一个作为多类分类器对语音病理进行分类。本研究探讨了电声门图(EGG)和语音信号在使用持续元音('/a/')样本检测和分类病理声音方面的有效性。EGG信号可以评估发声时声带的振动模式。另一方面,语音信号为EGG信号添加了光谱色彩。因此,他们对病理鉴定和分离的贡献不同,正如在这项工作中所证明的那样。本次调查使用了saarbr cken语音数据库(SVD)。结果表明,与EGG信号相比,基于语音信号的病理语音识别准确率更高(9%以上)。然而,与语音信号相比,EGG信号将病理语音分类为不同的病理类型的准确率更高(超过12%)。比较性能分析提出的系统与这两个信号在临床和统计措施。并将所得结果与其他已发表的相关文献进行了比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
5.90
自引率
0.00%
发文量
0
审稿时长
10 weeks
期刊最新文献
Fostering digital health literacy to enhance trust and improve health outcomes Machine learning from real data: A mental health registry case study ResfEANet: ResNet-fused External Attention Network for Tuberculosis Diagnosis using Chest X-ray Images Role-playing recovery in social virtual worlds: Adult use of child avatars as PTSD therapy Comparative evaluation of low-cost 3D scanning devices for ear acquisition
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1