Pathological voice classification using MEEL features and SVM-TabNet model

IF 2.4 3区 计算机科学 Q2 ACOUSTICS Speech Communication Pub Date : 2024-07-01 DOI:10.1016/j.specom.2024.103100
Mohammed Zakariah , Muna Al-Razgan , Taha Alfakih
{"title":"Pathological voice classification using MEEL features and SVM-TabNet model","authors":"Mohammed Zakariah ,&nbsp;Muna Al-Razgan ,&nbsp;Taha Alfakih","doi":"10.1016/j.specom.2024.103100","DOIUrl":null,"url":null,"abstract":"<div><p>In clinical settings, early diagnosis and objective assessment depend on the detection of voice pathology. To classify anomalous voices, this work uses an approach that combines the SVM-TabNet fusion model with MEEL (Mel-Frequency Energy Line) features. Further, the dataset consists of 1037 speech files, including recordings from people with laryngocele and Vox senilis as well as from healthy persons. Additionally, the main goal is to create an efficient classification model that can differentiate between normal and abnormal voice patterns. Modern techniques frequently lack the accuracy required for a precise diagnosis, which highlights the need for novel strategies. The suggested approach uses an SVM-TabNet fusion model for classification after feature extraction using MEEL characteristics. MEEL features provide extensive information for categorization by capturing complex patterns in audio transmissions. Moreover, by combining the advantages of SVM and TabNet models, classification performance is improved. Moreover, testing the model on test data yields remarkable results: 99.7 % accuracy, 0.992 F1 score, 0.996 precision, and 0.995 recall. Additional testing on additional datasets reliably validates outstanding performance, with 99.4 % accuracy, 0.99 F1 score, 0.998 precision, and 0.989 % recall. Furthermore, using the Saarbruecken Voice Database (SVD), the suggested methodology achieves an impressive accuracy of 99.97 %, demonstrating its durability and generalizability across many datasets. Overall, this work shows how the SVM-TabNet fusion model with MEEL characteristics may be used to accurately and consistently classify diseased voices, providing encouraging opportunities for clinical diagnosis and therapy tracking.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"162 ","pages":"Article 103100"},"PeriodicalIF":2.4000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Communication","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167639324000724","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

Abstract

In clinical settings, early diagnosis and objective assessment depend on the detection of voice pathology. To classify anomalous voices, this work uses an approach that combines the SVM-TabNet fusion model with MEEL (Mel-Frequency Energy Line) features. Further, the dataset consists of 1037 speech files, including recordings from people with laryngocele and Vox senilis as well as from healthy persons. Additionally, the main goal is to create an efficient classification model that can differentiate between normal and abnormal voice patterns. Modern techniques frequently lack the accuracy required for a precise diagnosis, which highlights the need for novel strategies. The suggested approach uses an SVM-TabNet fusion model for classification after feature extraction using MEEL characteristics. MEEL features provide extensive information for categorization by capturing complex patterns in audio transmissions. Moreover, by combining the advantages of SVM and TabNet models, classification performance is improved. Moreover, testing the model on test data yields remarkable results: 99.7 % accuracy, 0.992 F1 score, 0.996 precision, and 0.995 recall. Additional testing on additional datasets reliably validates outstanding performance, with 99.4 % accuracy, 0.99 F1 score, 0.998 precision, and 0.989 % recall. Furthermore, using the Saarbruecken Voice Database (SVD), the suggested methodology achieves an impressive accuracy of 99.97 %, demonstrating its durability and generalizability across many datasets. Overall, this work shows how the SVM-TabNet fusion model with MEEL characteristics may be used to accurately and consistently classify diseased voices, providing encouraging opportunities for clinical diagnosis and therapy tracking.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用 MEEL 特征和 SVM-TabNet 模型进行病态语音分类
在临床环境中,早期诊断和客观评估取决于嗓音病理的检测。为了对异常声音进行分类,这项工作采用了一种将 SVM-TabNet 融合模型与 MEEL(Mel-Frequency Energy Line)特征相结合的方法。此外,该数据集由 1037 个语音文件组成,其中包括喉结畸形和老年痴呆症患者以及健康人的录音。此外,主要目标是创建一个高效的分类模型,以区分正常和异常语音模式。现代技术往往缺乏精确诊断所需的准确性,这就凸显了对新策略的需求。建议的方法在使用 MEEL 特征提取特征后,使用 SVM-TabNet 融合模型进行分类。MEEL 特征通过捕捉音频传输中的复杂模式,为分类提供了广泛的信息。此外,通过结合 SVM 和 TabNet 模型的优势,分类性能得到了提高。此外,在测试数据上测试该模型也取得了显著效果:准确率为 99.7%,F1 分数为 0.992,精确度为 0.996,召回率为 0.995。在其他数据集上进行的额外测试可靠地验证了该模型的出色性能,准确率为 99.4%,F1 分数为 0.99,精确度为 0.998,召回率为 0.989%。此外,在使用萨尔布吕肯语音数据库(SVD)时,所建议的方法达到了令人印象深刻的 99.97 % 的准确率,证明了它在许多数据集上的持久性和通用性。总之,这项研究表明,具有 MEEL 特征的 SVM-TabNet 融合模型可用于准确、一致地对病态声音进行分类,为临床诊断和治疗跟踪提供了令人鼓舞的机会。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Speech Communication
Speech Communication 工程技术-计算机:跨学科应用
CiteScore
6.80
自引率
6.20%
发文量
94
审稿时长
19.2 weeks
期刊介绍: Speech Communication is an interdisciplinary journal whose primary objective is to fulfil the need for the rapid dissemination and thorough discussion of basic and applied research results. The journal''s primary objectives are: • to present a forum for the advancement of human and human-machine speech communication science; • to stimulate cross-fertilization between different fields of this domain; • to contribute towards the rapid and wide diffusion of scientifically sound contributions in this domain.
期刊最新文献
Fixed frequency range empirical wavelet transform based acoustic and entropy features for speech emotion recognition AFP-Conformer: Asymptotic feature pyramid conformer for spoofing speech detection A robust temporal map of speech monitoring from planning to articulation The combined effects of bilingualism and musicianship on listeners’ perception of non-native lexical tones Evaluating the effects of continuous pitch and speech tempo modifications on perceptual speaker verification performance by familiar and unfamiliar listeners
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1