Scalogram based performance comparison of deep learning architectures for dysarthric speech detection

IF 13.9 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Artificial Intelligence Review Pub Date : 2025-02-13 DOI:10.1007/s10462-024-11085-7
Shaik Mulla Shabber, E. P. Sumesh, Vidhya Lavanya Ramachandran
{"title":"Scalogram based performance comparison of deep learning architectures for dysarthric speech detection","authors":"Shaik Mulla Shabber,&nbsp;E. P. Sumesh,&nbsp;Vidhya Lavanya Ramachandran","doi":"10.1007/s10462-024-11085-7","DOIUrl":null,"url":null,"abstract":"<div><p>Dysarthria, a speech disorder commonly associated with neurological conditions, poses challenges in early detection and accurate diagnosis. This study addresses these challenges by implementing preprocessing steps, such as noise reduction and normalization, to enhance the quality of raw speech signals and extract relevant features. Scalogram images generated through wavelet transform effectively capture the time-frequency characteristics of the speech signal, offering a visual representation of the spectral content over time and providing valuable insights into speech abnormalities related to dysarthria. Fine-tuned deep learning models, including pre-trained convolutional neural network (CNN) architectures like VGG19, DenseNet-121, Xception, and a modified InceptionV3, were optimized with specific hyperparameters using training and validation sets. Transfer learning enables these models to adapt features from general image classification tasks to classify dysarthric speech signals better. The study evaluates the models using two public datasets TORGO and UA-Speech and a third dataset collected by the authors and verified by medical practitioners. The results reveal that the CNN models achieve an accuracy (acc) range of 90% to 99%, an F1-score range of 0.95 to 0.99, and a recall range of 0.96 to 0.99, outperforming traditional methods in dysarthria detection. These findings highlight the effectiveness of the proposed approach, leveraging deep learning and scalogram images to advance early diagnosis and healthcare outcomes for individuals with dysarthria.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 5","pages":""},"PeriodicalIF":13.9000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-11085-7.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-024-11085-7","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Dysarthria, a speech disorder commonly associated with neurological conditions, poses challenges in early detection and accurate diagnosis. This study addresses these challenges by implementing preprocessing steps, such as noise reduction and normalization, to enhance the quality of raw speech signals and extract relevant features. Scalogram images generated through wavelet transform effectively capture the time-frequency characteristics of the speech signal, offering a visual representation of the spectral content over time and providing valuable insights into speech abnormalities related to dysarthria. Fine-tuned deep learning models, including pre-trained convolutional neural network (CNN) architectures like VGG19, DenseNet-121, Xception, and a modified InceptionV3, were optimized with specific hyperparameters using training and validation sets. Transfer learning enables these models to adapt features from general image classification tasks to classify dysarthric speech signals better. The study evaluates the models using two public datasets TORGO and UA-Speech and a third dataset collected by the authors and verified by medical practitioners. The results reveal that the CNN models achieve an accuracy (acc) range of 90% to 99%, an F1-score range of 0.95 to 0.99, and a recall range of 0.96 to 0.99, outperforming traditional methods in dysarthria detection. These findings highlight the effectiveness of the proposed approach, leveraging deep learning and scalogram images to advance early diagnosis and healthcare outcomes for individuals with dysarthria.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于尺度图的深度学习架构在困难语音检测中的性能比较
构音障碍是一种通常与神经系统疾病相关的语言障碍,在早期发现和准确诊断方面提出了挑战。本研究通过实施降噪和归一化等预处理步骤来解决这些挑战,以提高原始语音信号的质量并提取相关特征。通过小波变换生成的尺度图图像有效地捕获了语音信号的时频特征,提供了随时间变化的频谱内容的可视化表示,并为与构音障碍相关的语音异常提供了有价值的见解。微调深度学习模型,包括预先训练的卷积神经网络(CNN)架构,如VGG19、DenseNet-121、Xception和改进的InceptionV3,使用特定的超参数使用训练和验证集进行优化。迁移学习使这些模型能够适应一般图像分类任务的特征,从而更好地对困难语音信号进行分类。该研究使用两个公共数据集TORGO和UA-Speech以及作者收集并由医生验证的第三个数据集来评估模型。结果表明,CNN模型的准确率(acc)范围为90% ~ 99%,f1评分范围为0.95 ~ 0.99,召回范围为0.96 ~ 0.99,优于传统的构音障碍检测方法。这些发现强调了所提出的方法的有效性,利用深度学习和尺度图图像来推进构音障碍患者的早期诊断和医疗结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Artificial Intelligence Review
Artificial Intelligence Review 工程技术-计算机:人工智能
CiteScore
22.00
自引率
3.30%
发文量
194
审稿时长
5.3 months
期刊介绍: Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.
期刊最新文献
Machine learning approaches for asthma exacerbation predictions: a systematic review Feeding, grooming, dressing, and body repositioning: categorizing four pillars of learning-based manipulation for robotic caregiving A survey on design choices for self-supervised learning in computer vision Recent trends of machine learning techniques for risk assessment in hazardous environments Artificial intelligence in Traditional Chinese Medicine herbs: a survey
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1