Scalogram based performance comparison of deep learning architectures for dysarthric speech detection

IF 10.7 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Artificial Intelligence Review Pub Date : 2025-02-13 DOI:10.1007/s10462-024-11085-7

Shaik Mulla Shabber, E. P. Sumesh, Vidhya Lavanya Ramachandran

{"title":"Scalogram based performance comparison of deep learning architectures for dysarthric speech detection","authors":"Shaik Mulla Shabber, E. P. Sumesh, Vidhya Lavanya Ramachandran","doi":"10.1007/s10462-024-11085-7","DOIUrl":null,"url":null,"abstract":"<div><p>Dysarthria, a speech disorder commonly associated with neurological conditions, poses challenges in early detection and accurate diagnosis. This study addresses these challenges by implementing preprocessing steps, such as noise reduction and normalization, to enhance the quality of raw speech signals and extract relevant features. Scalogram images generated through wavelet transform effectively capture the time-frequency characteristics of the speech signal, offering a visual representation of the spectral content over time and providing valuable insights into speech abnormalities related to dysarthria. Fine-tuned deep learning models, including pre-trained convolutional neural network (CNN) architectures like VGG19, DenseNet-121, Xception, and a modified InceptionV3, were optimized with specific hyperparameters using training and validation sets. Transfer learning enables these models to adapt features from general image classification tasks to classify dysarthric speech signals better. The study evaluates the models using two public datasets TORGO and UA-Speech and a third dataset collected by the authors and verified by medical practitioners. The results reveal that the CNN models achieve an accuracy (acc) range of 90% to 99%, an F1-score range of 0.95 to 0.99, and a recall range of 0.96 to 0.99, outperforming traditional methods in dysarthria detection. These findings highlight the effectiveness of the proposed approach, leveraging deep learning and scalogram images to advance early diagnosis and healthcare outcomes for individuals with dysarthria.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 5","pages":""},"PeriodicalIF":10.7000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-11085-7.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-024-11085-7","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Dysarthria, a speech disorder commonly associated with neurological conditions, poses challenges in early detection and accurate diagnosis. This study addresses these challenges by implementing preprocessing steps, such as noise reduction and normalization, to enhance the quality of raw speech signals and extract relevant features. Scalogram images generated through wavelet transform effectively capture the time-frequency characteristics of the speech signal, offering a visual representation of the spectral content over time and providing valuable insights into speech abnormalities related to dysarthria. Fine-tuned deep learning models, including pre-trained convolutional neural network (CNN) architectures like VGG19, DenseNet-121, Xception, and a modified InceptionV3, were optimized with specific hyperparameters using training and validation sets. Transfer learning enables these models to adapt features from general image classification tasks to classify dysarthric speech signals better. The study evaluates the models using two public datasets TORGO and UA-Speech and a third dataset collected by the authors and verified by medical practitioners. The results reveal that the CNN models achieve an accuracy (acc) range of 90% to 99%, an F1-score range of 0.95 to 0.99, and a recall range of 0.96 to 0.99, outperforming traditional methods in dysarthria detection. These findings highlight the effectiveness of the proposed approach, leveraging deep learning and scalogram images to advance early diagnosis and healthcare outcomes for individuals with dysarthria.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

求助全文

约1分钟内获得全文去求助

来源期刊

Artificial Intelligence Review 工程技术-计算机：人工智能

CiteScore

22.00

自引率

3.30%

发文量

194

审稿时长

5.3 months

期刊介绍： Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.