同一个体的多重语音障碍:调查手工特征、多标签分类算法和基础学习器

IF 2.4 3区 计算机科学 Q2 ACOUSTICS Speech Communication Pub Date : 2023-07-01 DOI:10.1016/j.specom.2023.102952
Sylvio Barbon Junior , Rodrigo Capobianco Guido , Gabriel Jonas Aguiar , Everton José Santana , Mario Lemes Proença Junior , Hemant A. Patil
{"title":"同一个体的多重语音障碍:调查手工特征、多标签分类算法和基础学习器","authors":"Sylvio Barbon Junior ,&nbsp;Rodrigo Capobianco Guido ,&nbsp;Gabriel Jonas Aguiar ,&nbsp;Everton José Santana ,&nbsp;Mario Lemes Proença Junior ,&nbsp;Hemant A. Patil","doi":"10.1016/j.specom.2023.102952","DOIUrl":null,"url":null,"abstract":"<div><p>Non-invasive acoustic analyses of voice disorders have been at the forefront of current biomedical research. Usual strategies, essentially based on machine learning (ML) algorithms, commonly classify a subject as being either healthy or pathologically-affected. Nevertheless, the latter state is not always a result of a sole laryngeal issue, i.e., multiple disorders might exist, demanding multi-label classification procedures for effective diagnoses. Consequently, the objective of this paper is to investigate the application of five multi-label classification methods based on problem transformation to play the role of base-learners, i.e., Label Powerset, Binary Relevance, Nested Stacking, Classifier Chains, and Dependent Binary Relevance with Random Forest (RF) and Support Vector Machine (SVM), in addition to a Deep Neural Network (DNN) from an algorithm adaptation method, to detect multiple voice disorders, i.e., Dysphonia, Laryngitis, Reinke’s Edema, Vox Senilis, and Central Laryngeal Motion Disorder. Receiving as input three handcrafted features, i.e., signal energy (SE), zero-crossing rates (ZCRs), and signal entropy (SH), which allow for interpretable descriptors in terms of speech analysis, production, and perception, we observed that the DNN-based approach powered with SE-based feature vectors presented the best values of F1-score among the tested methods, i.e., 0.943, as the averaged value from all the balancing scenarios, under Saarbrücken Voice Database (SVD) and considering 20% of balancing rate with Synthetic Minority Over-sampling Technique (SMOTE). Finally, our findings of most false negatives for laryngitis may explain the reason why its detection is a serious issue in speech technology. The results we report provide an original contribution, allowing for the consistent detection of multiple speech pathologies and advancing the state-of-the-art in the field of handcrafted acoustic-based non-invasive diagnosis of voice disorders.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multiple voice disorders in the same individual: Investigating handcrafted features, multi-label classification algorithms, and base-learners\",\"authors\":\"Sylvio Barbon Junior ,&nbsp;Rodrigo Capobianco Guido ,&nbsp;Gabriel Jonas Aguiar ,&nbsp;Everton José Santana ,&nbsp;Mario Lemes Proença Junior ,&nbsp;Hemant A. Patil\",\"doi\":\"10.1016/j.specom.2023.102952\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Non-invasive acoustic analyses of voice disorders have been at the forefront of current biomedical research. Usual strategies, essentially based on machine learning (ML) algorithms, commonly classify a subject as being either healthy or pathologically-affected. Nevertheless, the latter state is not always a result of a sole laryngeal issue, i.e., multiple disorders might exist, demanding multi-label classification procedures for effective diagnoses. Consequently, the objective of this paper is to investigate the application of five multi-label classification methods based on problem transformation to play the role of base-learners, i.e., Label Powerset, Binary Relevance, Nested Stacking, Classifier Chains, and Dependent Binary Relevance with Random Forest (RF) and Support Vector Machine (SVM), in addition to a Deep Neural Network (DNN) from an algorithm adaptation method, to detect multiple voice disorders, i.e., Dysphonia, Laryngitis, Reinke’s Edema, Vox Senilis, and Central Laryngeal Motion Disorder. Receiving as input three handcrafted features, i.e., signal energy (SE), zero-crossing rates (ZCRs), and signal entropy (SH), which allow for interpretable descriptors in terms of speech analysis, production, and perception, we observed that the DNN-based approach powered with SE-based feature vectors presented the best values of F1-score among the tested methods, i.e., 0.943, as the averaged value from all the balancing scenarios, under Saarbrücken Voice Database (SVD) and considering 20% of balancing rate with Synthetic Minority Over-sampling Technique (SMOTE). Finally, our findings of most false negatives for laryngitis may explain the reason why its detection is a serious issue in speech technology. The results we report provide an original contribution, allowing for the consistent detection of multiple speech pathologies and advancing the state-of-the-art in the field of handcrafted acoustic-based non-invasive diagnosis of voice disorders.</p></div>\",\"PeriodicalId\":49485,\"journal\":{\"name\":\"Speech Communication\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2023-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Speech Communication\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167639323000869\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Communication","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167639323000869","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

摘要

声音障碍的非侵入性声学分析已成为当前生物医学研究的前沿。通常的策略基本上是基于机器学习(ML)算法,通常将受试者分为健康或病理影响。然而,后一种状态并不总是单一喉部问题的结果,即可能存在多种疾病,需要多标签分类程序才能有效诊断。因此,本文的目的是研究基于问题变换的五种多标签分类方法的应用,即标签Powerset、二进制关联、嵌套堆叠、分类器链和依赖二进制关联,以及随机森林(RF)和支持向量机(SVM),以及来自算法自适应方法的深度神经网络(DNN),以发挥基本学习器的作用,检测多种语音障碍,如语音障碍、喉炎、赖因克水肿、老年性声带和中枢性喉运动障碍。接收三个手工特征,即信号能量(SE)、过零率(zcr)和信号熵(SH)作为输入,这些特征允许在语音分析、产生和感知方面提供可解释的描述符,我们观察到基于SE的特征向量支持的基于dnn的方法在测试方法中具有最佳的f1得分值,即0.943,作为所有平衡场景的平均值。在saarbr cken语音数据库(SVD)下,使用合成少数派过采样技术(SMOTE)考虑20%的平衡率。最后,我们发现喉炎的大多数假阴性可能解释了为什么它的检测在语音技术中是一个严重的问题。我们报告的结果提供了一个原始的贡献,允许对多种语言病理进行一致的检测,并推进了语音障碍的手工声学非侵入性诊断领域的最新技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Multiple voice disorders in the same individual: Investigating handcrafted features, multi-label classification algorithms, and base-learners

Non-invasive acoustic analyses of voice disorders have been at the forefront of current biomedical research. Usual strategies, essentially based on machine learning (ML) algorithms, commonly classify a subject as being either healthy or pathologically-affected. Nevertheless, the latter state is not always a result of a sole laryngeal issue, i.e., multiple disorders might exist, demanding multi-label classification procedures for effective diagnoses. Consequently, the objective of this paper is to investigate the application of five multi-label classification methods based on problem transformation to play the role of base-learners, i.e., Label Powerset, Binary Relevance, Nested Stacking, Classifier Chains, and Dependent Binary Relevance with Random Forest (RF) and Support Vector Machine (SVM), in addition to a Deep Neural Network (DNN) from an algorithm adaptation method, to detect multiple voice disorders, i.e., Dysphonia, Laryngitis, Reinke’s Edema, Vox Senilis, and Central Laryngeal Motion Disorder. Receiving as input three handcrafted features, i.e., signal energy (SE), zero-crossing rates (ZCRs), and signal entropy (SH), which allow for interpretable descriptors in terms of speech analysis, production, and perception, we observed that the DNN-based approach powered with SE-based feature vectors presented the best values of F1-score among the tested methods, i.e., 0.943, as the averaged value from all the balancing scenarios, under Saarbrücken Voice Database (SVD) and considering 20% of balancing rate with Synthetic Minority Over-sampling Technique (SMOTE). Finally, our findings of most false negatives for laryngitis may explain the reason why its detection is a serious issue in speech technology. The results we report provide an original contribution, allowing for the consistent detection of multiple speech pathologies and advancing the state-of-the-art in the field of handcrafted acoustic-based non-invasive diagnosis of voice disorders.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Speech Communication
Speech Communication 工程技术-计算机:跨学科应用
CiteScore
6.80
自引率
6.20%
发文量
94
审稿时长
19.2 weeks
期刊介绍: Speech Communication is an interdisciplinary journal whose primary objective is to fulfil the need for the rapid dissemination and thorough discussion of basic and applied research results. The journal''s primary objectives are: • to present a forum for the advancement of human and human-machine speech communication science; • to stimulate cross-fertilization between different fields of this domain; • to contribute towards the rapid and wide diffusion of scientifically sound contributions in this domain.
期刊最新文献
A corpus of audio-visual recordings of linguistically balanced, Danish sentences for speech-in-noise experiments Forms, factors and functions of phonetic convergence: Editorial Feasibility of acoustic features of vowel sounds in estimating the upper airway cross sectional area during wakefulness: A pilot study Zero-shot voice conversion based on feature disentanglement Multi-modal co-learning for silent speech recognition based on ultrasound tongue images
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1