{"title":"Multitask and Transfer Learning Approach for Joint Classification and Severity Estimation of Dysphonia","authors":"Dosti Aziz;Sztahó Dávid","doi":"10.1109/JTEHM.2023.3340345","DOIUrl":null,"url":null,"abstract":"Objective: Despite speech being the primary communication medium, it carries valuable information about a speaker’s health, emotions, and identity. Various conditions can affect the vocal organs, leading to speech difficulties. Extensive research has been conducted by voice clinicians and academia in speech analysis. Previous approaches primarily focused on one particular task, such as differentiating between normal and dysphonic speech, classifying different voice disorders, or estimating the severity of voice disorders. Methods and procedures: This study proposes an approach that combines transfer learning and multitask learning (MTL) to simultaneously perform dysphonia classification and severity estimation. Both tasks use a shared representation; network is learned from these shared features. We employed five computer vision models and changed their architecture to support multitask learning. Additionally, we conducted binary ‘healthy vs. dysphonia’ and multiclass ‘healthy vs. organic and functional dysphonia’ classification using multitask learning, with the speaker’s sex as an auxiliary task. Results: The proposed method achieved improved performance across all classification metrics compared to single-task learning (STL), which only performs classification or severity estimation. Specifically, the model achieved F1 scores of 93% and 90% in MTL and STL, respectively. Moreover, we observed considerable improvements in both classification tasks by evaluating beta values associated with the weight assigned to the sex-predicting auxiliary task. MTL achieved an accuracy of 77% compared to the STL score of 73.2%. However, the performance of severity estimation in MTL was comparable to STL. Conclusion: Our goal is to improve how voice pathologists and clinicians understand patients’ conditions, make it easier to track their progress, and enhance the monitoring of vocal quality and treatment procedures. Clinical and Translational Impact Statement: By integrating both classification and severity estimation of dysphonia using multitask learning, we aim to enable clinicians to gain a better understanding of the patient’s situation, effectively monitor their progress and voice quality.","PeriodicalId":54255,"journal":{"name":"IEEE Journal of Translational Engineering in Health and Medicine-Jtehm","volume":"12 ","pages":"233-244"},"PeriodicalIF":3.7000,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10347235","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Translational Engineering in Health and Medicine-Jtehm","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10347235/","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: Despite speech being the primary communication medium, it carries valuable information about a speaker’s health, emotions, and identity. Various conditions can affect the vocal organs, leading to speech difficulties. Extensive research has been conducted by voice clinicians and academia in speech analysis. Previous approaches primarily focused on one particular task, such as differentiating between normal and dysphonic speech, classifying different voice disorders, or estimating the severity of voice disorders. Methods and procedures: This study proposes an approach that combines transfer learning and multitask learning (MTL) to simultaneously perform dysphonia classification and severity estimation. Both tasks use a shared representation; network is learned from these shared features. We employed five computer vision models and changed their architecture to support multitask learning. Additionally, we conducted binary ‘healthy vs. dysphonia’ and multiclass ‘healthy vs. organic and functional dysphonia’ classification using multitask learning, with the speaker’s sex as an auxiliary task. Results: The proposed method achieved improved performance across all classification metrics compared to single-task learning (STL), which only performs classification or severity estimation. Specifically, the model achieved F1 scores of 93% and 90% in MTL and STL, respectively. Moreover, we observed considerable improvements in both classification tasks by evaluating beta values associated with the weight assigned to the sex-predicting auxiliary task. MTL achieved an accuracy of 77% compared to the STL score of 73.2%. However, the performance of severity estimation in MTL was comparable to STL. Conclusion: Our goal is to improve how voice pathologists and clinicians understand patients’ conditions, make it easier to track their progress, and enhance the monitoring of vocal quality and treatment procedures. Clinical and Translational Impact Statement: By integrating both classification and severity estimation of dysphonia using multitask learning, we aim to enable clinicians to gain a better understanding of the patient’s situation, effectively monitor their progress and voice quality.
期刊介绍:
The IEEE Journal of Translational Engineering in Health and Medicine is an open access product that bridges the engineering and clinical worlds, focusing on detailed descriptions of advanced technical solutions to a clinical need along with clinical results and healthcare relevance. The journal provides a platform for state-of-the-art technology directions in the interdisciplinary field of biomedical engineering, embracing engineering, life sciences and medicine. A unique aspect of the journal is its ability to foster a collaboration between physicians and engineers for presenting broad and compelling real world technological and engineering solutions that can be implemented in the interest of improving quality of patient care and treatment outcomes, thereby reducing costs and improving efficiency. The journal provides an active forum for clinical research and relevant state-of the-art technology for members of all the IEEE societies that have an interest in biomedical engineering as well as reaching out directly to physicians and the medical community through the American Medical Association (AMA) and other clinical societies. The scope of the journal includes, but is not limited, to topics on: Medical devices, healthcare delivery systems, global healthcare initiatives, and ICT based services; Technological relevance to healthcare cost reduction; Technology affecting healthcare management, decision-making, and policy; Advanced technical work that is applied to solving specific clinical needs.