Pathological voice classification using MEEL features and SVM-TabNet model

IF 2.4 3区计算机科学 Q2 ACOUSTICS Speech Communication Pub Date : 2024-07-01 DOI:10.1016/j.specom.2024.103100

Mohammed Zakariah , Muna Al-Razgan , Taha Alfakih

{"title":"Pathological voice classification using MEEL features and SVM-TabNet model","authors":"Mohammed Zakariah , Muna Al-Razgan , Taha Alfakih","doi":"10.1016/j.specom.2024.103100","DOIUrl":null,"url":null,"abstract":"<div><p>In clinical settings, early diagnosis and objective assessment depend on the detection of voice pathology. To classify anomalous voices, this work uses an approach that combines the SVM-TabNet fusion model with MEEL (Mel-Frequency Energy Line) features. Further, the dataset consists of 1037 speech files, including recordings from people with laryngocele and Vox senilis as well as from healthy persons. Additionally, the main goal is to create an efficient classification model that can differentiate between normal and abnormal voice patterns. Modern techniques frequently lack the accuracy required for a precise diagnosis, which highlights the need for novel strategies. The suggested approach uses an SVM-TabNet fusion model for classification after feature extraction using MEEL characteristics. MEEL features provide extensive information for categorization by capturing complex patterns in audio transmissions. Moreover, by combining the advantages of SVM and TabNet models, classification performance is improved. Moreover, testing the model on test data yields remarkable results: 99.7 % accuracy, 0.992 F1 score, 0.996 precision, and 0.995 recall. Additional testing on additional datasets reliably validates outstanding performance, with 99.4 % accuracy, 0.99 F1 score, 0.998 precision, and 0.989 % recall. Furthermore, using the Saarbruecken Voice Database (SVD), the suggested methodology achieves an impressive accuracy of 99.97 %, demonstrating its durability and generalizability across many datasets. Overall, this work shows how the SVM-TabNet fusion model with MEEL characteristics may be used to accurately and consistently classify diseased voices, providing encouraging opportunities for clinical diagnosis and therapy tracking.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"162 ","pages":"Article 103100"},"PeriodicalIF":2.4000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Communication","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167639324000724","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

Abstract

In clinical settings, early diagnosis and objective assessment depend on the detection of voice pathology. To classify anomalous voices, this work uses an approach that combines the SVM-TabNet fusion model with MEEL (Mel-Frequency Energy Line) features. Further, the dataset consists of 1037 speech files, including recordings from people with laryngocele and Vox senilis as well as from healthy persons. Additionally, the main goal is to create an efficient classification model that can differentiate between normal and abnormal voice patterns. Modern techniques frequently lack the accuracy required for a precise diagnosis, which highlights the need for novel strategies. The suggested approach uses an SVM-TabNet fusion model for classification after feature extraction using MEEL characteristics. MEEL features provide extensive information for categorization by capturing complex patterns in audio transmissions. Moreover, by combining the advantages of SVM and TabNet models, classification performance is improved. Moreover, testing the model on test data yields remarkable results: 99.7 % accuracy, 0.992 F1 score, 0.996 precision, and 0.995 recall. Additional testing on additional datasets reliably validates outstanding performance, with 99.4 % accuracy, 0.99 F1 score, 0.998 precision, and 0.989 % recall. Furthermore, using the Saarbruecken Voice Database (SVD), the suggested methodology achieves an impressive accuracy of 99.97 %, demonstrating its durability and generalizability across many datasets. Overall, this work shows how the SVM-TabNet fusion model with MEEL characteristics may be used to accurately and consistently classify diseased voices, providing encouraging opportunities for clinical diagnosis and therapy tracking.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用 MEEL 特征和 SVM-TabNet 模型进行病态语音分类

在临床环境中，早期诊断和客观评估取决于嗓音病理的检测。为了对异常声音进行分类，这项工作采用了一种将 SVM-TabNet 融合模型与 MEEL（Mel-Frequency Energy Line）特征相结合的方法。此外，该数据集由 1037 个语音文件组成，其中包括喉结畸形和老年痴呆症患者以及健康人的录音。此外，主要目标是创建一个高效的分类模型，以区分正常和异常语音模式。现代技术往往缺乏精确诊断所需的准确性，这就凸显了对新策略的需求。建议的方法在使用 MEEL 特征提取特征后，使用 SVM-TabNet 融合模型进行分类。MEEL 特征通过捕捉音频传输中的复杂模式，为分类提供了广泛的信息。此外，通过结合 SVM 和 TabNet 模型的优势，分类性能得到了提高。此外，在测试数据上测试该模型也取得了显著效果：准确率为 99.7%，F1 分数为 0.992，精确度为 0.996，召回率为 0.995。在其他数据集上进行的额外测试可靠地验证了该模型的出色性能，准确率为 99.4%，F1 分数为 0.99，精确度为 0.998，召回率为 0.989%。此外，在使用萨尔布吕肯语音数据库（SVD）时，所建议的方法达到了令人印象深刻的 99.97 % 的准确率，证明了它在许多数据集上的持久性和通用性。总之，这项研究表明，具有 MEEL 特征的 SVM-TabNet 融合模型可用于准确、一致地对病态声音进行分类，为临床诊断和治疗跟踪提供了令人鼓舞的机会。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Speech Communication 工程技术-计算机：跨学科应用

CiteScore

6.80

自引率

6.20%

发文量

审稿时长

19.2 weeks

期刊介绍： Speech Communication is an interdisciplinary journal whose primary objective is to fulfil the need for the rapid dissemination and thorough discussion of basic and applied research results. The journal''s primary objectives are: • to present a forum for the advancement of human and human-machine speech communication science; • to stimulate cross-fertilization between different fields of this domain; • to contribute towards the rapid and wide diffusion of scientifically sound contributions in this domain.

期刊最新文献

The Ohio Child Speech Corpus Editorial Board Phonetic realizations of focus in declarative intonation in Iraqi Arabic Non-intrusive binaural speech recognition prediction for hearing aid processing Nasal coarticulation in Lombard speech