Rumana Islam , Esam Abdel-Raheem , Mohammed Tarique
{"title":"Voice pathology detection using convolutional neural networks with electroglottographic (EGG) and speech signals","authors":"Rumana Islam , Esam Abdel-Raheem , Mohammed Tarique","doi":"10.1016/j.cmpbup.2022.100074","DOIUrl":null,"url":null,"abstract":"<div><p>This paper presents a convolutional neural network (CNN) based automated noninvasive voice pathology detection system. The proposed system functions in two steps. First, it discriminates pathological voices from healthy ones, and then, it classifies the discriminated pathological voices into one of the three pathologies. Two CNNs are used for these purposes; one works as a binary classifier to identify pathological voices. The other one works as a multiclass classifier for categorizing the voice pathologies. This work investigates the effectiveness of electroglottographic (EGG) and speech signals to detect and classify pathological voices using sustained vowel ('/a/') samples. EGG signals can assess the vibratory pattern of the vocal folds during voiced sound. On the other hand, the speech signals add spectral color to the EGG signals. Hence, their contributions for pathology identification and segregation differ, as demonstrated in this work. The Saarbrücken Voice Database (SVD) is used in this investigation. The results show that the proposed system achieves a higher accuracy (more than 9%) in identifying pathological voices from healthy ones with speech signals than EGG signals. However, categorizing pathological voices into different pathology types demonstrates higher accuracy (more than 12%) with EGG signals than speech signals. A comparative performance analysis of the proposed system is presented with these two signals in terms of clinical and statistical measures. The obtained results of this work are also compared with those of other related published works.</p></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"2 ","pages":"Article 100074"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666990022000258/pdfft?md5=8eea3c31d7c8f756c52783bf420ea51b&pid=1-s2.0-S2666990022000258-main.pdf","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine update","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666990022000258","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
This paper presents a convolutional neural network (CNN) based automated noninvasive voice pathology detection system. The proposed system functions in two steps. First, it discriminates pathological voices from healthy ones, and then, it classifies the discriminated pathological voices into one of the three pathologies. Two CNNs are used for these purposes; one works as a binary classifier to identify pathological voices. The other one works as a multiclass classifier for categorizing the voice pathologies. This work investigates the effectiveness of electroglottographic (EGG) and speech signals to detect and classify pathological voices using sustained vowel ('/a/') samples. EGG signals can assess the vibratory pattern of the vocal folds during voiced sound. On the other hand, the speech signals add spectral color to the EGG signals. Hence, their contributions for pathology identification and segregation differ, as demonstrated in this work. The Saarbrücken Voice Database (SVD) is used in this investigation. The results show that the proposed system achieves a higher accuracy (more than 9%) in identifying pathological voices from healthy ones with speech signals than EGG signals. However, categorizing pathological voices into different pathology types demonstrates higher accuracy (more than 12%) with EGG signals than speech signals. A comparative performance analysis of the proposed system is presented with these two signals in terms of clinical and statistical measures. The obtained results of this work are also compared with those of other related published works.