Sylvio Barbon Junior , Rodrigo Capobianco Guido , Gabriel Jonas Aguiar , Everton José Santana , Mario Lemes Proença Junior , Hemant A. Patil
{"title":"同一个体的多重语音障碍:调查手工特征、多标签分类算法和基础学习器","authors":"Sylvio Barbon Junior , Rodrigo Capobianco Guido , Gabriel Jonas Aguiar , Everton José Santana , Mario Lemes Proença Junior , Hemant A. Patil","doi":"10.1016/j.specom.2023.102952","DOIUrl":null,"url":null,"abstract":"<div><p>Non-invasive acoustic analyses of voice disorders have been at the forefront of current biomedical research. Usual strategies, essentially based on machine learning (ML) algorithms, commonly classify a subject as being either healthy or pathologically-affected. Nevertheless, the latter state is not always a result of a sole laryngeal issue, i.e., multiple disorders might exist, demanding multi-label classification procedures for effective diagnoses. Consequently, the objective of this paper is to investigate the application of five multi-label classification methods based on problem transformation to play the role of base-learners, i.e., Label Powerset, Binary Relevance, Nested Stacking, Classifier Chains, and Dependent Binary Relevance with Random Forest (RF) and Support Vector Machine (SVM), in addition to a Deep Neural Network (DNN) from an algorithm adaptation method, to detect multiple voice disorders, i.e., Dysphonia, Laryngitis, Reinke’s Edema, Vox Senilis, and Central Laryngeal Motion Disorder. Receiving as input three handcrafted features, i.e., signal energy (SE), zero-crossing rates (ZCRs), and signal entropy (SH), which allow for interpretable descriptors in terms of speech analysis, production, and perception, we observed that the DNN-based approach powered with SE-based feature vectors presented the best values of F1-score among the tested methods, i.e., 0.943, as the averaged value from all the balancing scenarios, under Saarbrücken Voice Database (SVD) and considering 20% of balancing rate with Synthetic Minority Over-sampling Technique (SMOTE). Finally, our findings of most false negatives for laryngitis may explain the reason why its detection is a serious issue in speech technology. The results we report provide an original contribution, allowing for the consistent detection of multiple speech pathologies and advancing the state-of-the-art in the field of handcrafted acoustic-based non-invasive diagnosis of voice disorders.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multiple voice disorders in the same individual: Investigating handcrafted features, multi-label classification algorithms, and base-learners\",\"authors\":\"Sylvio Barbon Junior , Rodrigo Capobianco Guido , Gabriel Jonas Aguiar , Everton José Santana , Mario Lemes Proença Junior , Hemant A. Patil\",\"doi\":\"10.1016/j.specom.2023.102952\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Non-invasive acoustic analyses of voice disorders have been at the forefront of current biomedical research. Usual strategies, essentially based on machine learning (ML) algorithms, commonly classify a subject as being either healthy or pathologically-affected. Nevertheless, the latter state is not always a result of a sole laryngeal issue, i.e., multiple disorders might exist, demanding multi-label classification procedures for effective diagnoses. Consequently, the objective of this paper is to investigate the application of five multi-label classification methods based on problem transformation to play the role of base-learners, i.e., Label Powerset, Binary Relevance, Nested Stacking, Classifier Chains, and Dependent Binary Relevance with Random Forest (RF) and Support Vector Machine (SVM), in addition to a Deep Neural Network (DNN) from an algorithm adaptation method, to detect multiple voice disorders, i.e., Dysphonia, Laryngitis, Reinke’s Edema, Vox Senilis, and Central Laryngeal Motion Disorder. Receiving as input three handcrafted features, i.e., signal energy (SE), zero-crossing rates (ZCRs), and signal entropy (SH), which allow for interpretable descriptors in terms of speech analysis, production, and perception, we observed that the DNN-based approach powered with SE-based feature vectors presented the best values of F1-score among the tested methods, i.e., 0.943, as the averaged value from all the balancing scenarios, under Saarbrücken Voice Database (SVD) and considering 20% of balancing rate with Synthetic Minority Over-sampling Technique (SMOTE). Finally, our findings of most false negatives for laryngitis may explain the reason why its detection is a serious issue in speech technology. The results we report provide an original contribution, allowing for the consistent detection of multiple speech pathologies and advancing the state-of-the-art in the field of handcrafted acoustic-based non-invasive diagnosis of voice disorders.</p></div>\",\"PeriodicalId\":49485,\"journal\":{\"name\":\"Speech Communication\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2023-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Speech Communication\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167639323000869\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Communication","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167639323000869","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
Multiple voice disorders in the same individual: Investigating handcrafted features, multi-label classification algorithms, and base-learners
Non-invasive acoustic analyses of voice disorders have been at the forefront of current biomedical research. Usual strategies, essentially based on machine learning (ML) algorithms, commonly classify a subject as being either healthy or pathologically-affected. Nevertheless, the latter state is not always a result of a sole laryngeal issue, i.e., multiple disorders might exist, demanding multi-label classification procedures for effective diagnoses. Consequently, the objective of this paper is to investigate the application of five multi-label classification methods based on problem transformation to play the role of base-learners, i.e., Label Powerset, Binary Relevance, Nested Stacking, Classifier Chains, and Dependent Binary Relevance with Random Forest (RF) and Support Vector Machine (SVM), in addition to a Deep Neural Network (DNN) from an algorithm adaptation method, to detect multiple voice disorders, i.e., Dysphonia, Laryngitis, Reinke’s Edema, Vox Senilis, and Central Laryngeal Motion Disorder. Receiving as input three handcrafted features, i.e., signal energy (SE), zero-crossing rates (ZCRs), and signal entropy (SH), which allow for interpretable descriptors in terms of speech analysis, production, and perception, we observed that the DNN-based approach powered with SE-based feature vectors presented the best values of F1-score among the tested methods, i.e., 0.943, as the averaged value from all the balancing scenarios, under Saarbrücken Voice Database (SVD) and considering 20% of balancing rate with Synthetic Minority Over-sampling Technique (SMOTE). Finally, our findings of most false negatives for laryngitis may explain the reason why its detection is a serious issue in speech technology. The results we report provide an original contribution, allowing for the consistent detection of multiple speech pathologies and advancing the state-of-the-art in the field of handcrafted acoustic-based non-invasive diagnosis of voice disorders.
期刊介绍:
Speech Communication is an interdisciplinary journal whose primary objective is to fulfil the need for the rapid dissemination and thorough discussion of basic and applied research results.
The journal''s primary objectives are:
• to present a forum for the advancement of human and human-machine speech communication science;
• to stimulate cross-fertilization between different fields of this domain;
• to contribute towards the rapid and wide diffusion of scientifically sound contributions in this domain.