A geometric-approach based Combinatorial Transformative Scalogram analysis for multiclass identification of pathologies in a voice signal

IF 3 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Multimedia Tools and Applications Pub Date : 2024-09-05 DOI:10.1007/s11042-024-20067-4
Ranita Khumukcham, Kishorjit Nongmeikapam
{"title":"A geometric-approach based Combinatorial Transformative Scalogram analysis for multiclass identification of pathologies in a voice signal","authors":"Ranita Khumukcham, Kishorjit Nongmeikapam","doi":"10.1007/s11042-024-20067-4","DOIUrl":null,"url":null,"abstract":"<p>Many researchers have preferred non-invasive techniques for recognizing the exact type of physiological abnormality in the vocal tract by training machine learning algorithms with feature descriptors extracted from the voice signal. However, until now, most techniques have been limited to classifying whether a voice is normal or abnormal. It is crucial that the trained Artificial Intelligence (AI) be able to identify the exact pathology associated with voice for implementation in a realistic environment. Another issue is the need to suppress the ambient noise that could be mixed up with the spectra of the voice. Current work proposes a robust, less time-consuming and non-invasive technique for the identification of pathology associated with a laryngeal voice signal. More specifically, a two-stage signal filtering approach that encompasses a score-based geometric approach and a glottal inverse filtering method is applied to the input voice signal. The aim here is to estimate the noise spectra, to regenerate a clean signal and finally to deliver a completely fundamental glottal flow-derived signal. For the next stage, clean glottal derivative signals are used in the formation of a novel fused-scalogram which is currently referred to as the \"Combinatorial Transformative Scalogram (CTS).\" The CTS is a time-frequency domain plot which is a combination of two time-frequency scalograms. There is a thorough investigation of the performance of the two individual scalograms as well as that of the CTS database.Nine classification metrics are used to investigate performance, which are: sensitivity, mean accuracy, error, precision, false positive rate, specificity, Cohen’s kappa, Matthews Correlation Coefficient, and F1 score. Implementation of the VOice ICar fEDerico II (VOICED) standard database provided the highest mean accuracy of 94.12<span>\\(\\%\\)</span> with a sensitivity of 93.85<span>\\(\\%\\)</span> and a specificity of 97.96<span>\\(\\%\\)</span> against other existing techniques. The current method performed well despite the data imbalance that exists between classes.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.0000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Tools and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11042-024-20067-4","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Many researchers have preferred non-invasive techniques for recognizing the exact type of physiological abnormality in the vocal tract by training machine learning algorithms with feature descriptors extracted from the voice signal. However, until now, most techniques have been limited to classifying whether a voice is normal or abnormal. It is crucial that the trained Artificial Intelligence (AI) be able to identify the exact pathology associated with voice for implementation in a realistic environment. Another issue is the need to suppress the ambient noise that could be mixed up with the spectra of the voice. Current work proposes a robust, less time-consuming and non-invasive technique for the identification of pathology associated with a laryngeal voice signal. More specifically, a two-stage signal filtering approach that encompasses a score-based geometric approach and a glottal inverse filtering method is applied to the input voice signal. The aim here is to estimate the noise spectra, to regenerate a clean signal and finally to deliver a completely fundamental glottal flow-derived signal. For the next stage, clean glottal derivative signals are used in the formation of a novel fused-scalogram which is currently referred to as the "Combinatorial Transformative Scalogram (CTS)." The CTS is a time-frequency domain plot which is a combination of two time-frequency scalograms. There is a thorough investigation of the performance of the two individual scalograms as well as that of the CTS database.Nine classification metrics are used to investigate performance, which are: sensitivity, mean accuracy, error, precision, false positive rate, specificity, Cohen’s kappa, Matthews Correlation Coefficient, and F1 score. Implementation of the VOice ICar fEDerico II (VOICED) standard database provided the highest mean accuracy of 94.12\(\%\) with a sensitivity of 93.85\(\%\) and a specificity of 97.96\(\%\) against other existing techniques. The current method performed well despite the data imbalance that exists between classes.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于几何方法的组合变换 Scalogram 分析法,用于对语音信号中的病变进行多类识别
许多研究人员倾向于采用非侵入式技术,通过从语音信号中提取特征描述符来训练机器学习算法,从而准确识别声道生理异常的类型。然而,迄今为止,大多数技术仅限于对声音正常或异常进行分类。至关重要的是,经过训练的人工智能(AI)必须能够识别与嗓音相关的确切病理,以便在现实环境中实施。另一个问题是需要抑制可能与语音频谱混杂在一起的环境噪音。目前的工作提出了一种稳健、耗时较少且非侵入性的技术,用于识别与喉部声音信号相关的病理。更具体地说,对输入的语音信号采用了两阶段信号滤波方法,包括基于评分的几何方法和声门反滤波方法。其目的是估计噪声频谱,重新生成干净的信号,最后提供完全基本的声门流量衍生信号。在下一阶段,干净的声门导数信号被用于形成一种新的融合声谱图,这种声谱图目前被称为 "组合变换声谱图(CTS)"。CTS 是一个时频域图,由两个时频频谱图组合而成。在研究性能时使用了九个分类指标,分别是:灵敏度、平均准确度、误差、精确度、假阳性率、特异性、科恩卡帕(Cohen's kappa)、马修斯相关系数(Matthews Correlation Coefficient)和 F1 分数。与其他现有技术相比,使用 VOice ICar fEDerico II (VOICED) 标准数据库的平均准确率最高,为 94.12%,灵敏度为 93.85%,特异性为 97.96%。尽管类与类之间存在数据不平衡,但目前的方法表现良好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Multimedia Tools and Applications
Multimedia Tools and Applications 工程技术-工程:电子与电气
CiteScore
7.20
自引率
16.70%
发文量
2439
审稿时长
9.2 months
期刊介绍: Multimedia Tools and Applications publishes original research articles on multimedia development and system support tools as well as case studies of multimedia applications. It also features experimental and survey articles. The journal is intended for academics, practitioners, scientists and engineers who are involved in multimedia system research, design and applications. All papers are peer reviewed. Specific areas of interest include: - Multimedia Tools: - Multimedia Applications: - Prototype multimedia systems and platforms
期刊最新文献
MeVs-deep CNN: optimized deep learning model for efficient lung cancer classification Text-driven clothed human image synthesis with 3D human model estimation for assistance in shopping Hybrid golden jackal fusion based recommendation system for spatio-temporal transportation's optimal traffic congestion and road condition classification Deep-Dixon: Deep-Learning frameworks for fusion of MR T1 images for fat and water extraction Unified pre-training with pseudo infrared images for visible-infrared person re-identification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1