Classification of functional dysphonia using the tunable Q wavelet transform

IF 2.4 3区 计算机科学 Q2 ACOUSTICS Speech Communication Pub Date : 2023-10-06 DOI:10.1016/j.specom.2023.102989
Kiran Reddy Mittapalle , Madhu Keerthana Yagnavajjula , Paavo Alku
{"title":"Classification of functional dysphonia using the tunable Q wavelet transform","authors":"Kiran Reddy Mittapalle ,&nbsp;Madhu Keerthana Yagnavajjula ,&nbsp;Paavo Alku","doi":"10.1016/j.specom.2023.102989","DOIUrl":null,"url":null,"abstract":"<div><p>Functional dysphonia (FD) refers to an abnormality in voice quality in the absence of an identifiable lesion. In this paper, we propose an approach based on the tunable Q wavelet transform (TQWT) to automatically classify two types of FD (hyperfunctional dysphonia and hypofunctional dysphonia) from a healthy voice using the acoustic voice signal. Using TQWT, voice signals were decomposed into sub-bands and the entropy values extracted from the sub-bands were utilized as features for the studied 3-class classification problem. In addition, the Mel-frequency cepstral coefficient (MFCC) and glottal features were extracted from the acoustic voice signal and the estimated glottal source signal, respectively. A convolutional neural network (CNN) classifier was trained separately for the TQWT, MFCC and glottal features. Experiments were conducted using voice signals of 57 healthy speakers and 113 FD patients (72 with hyperfunctional dysphonia and 41 with hypofunctional dysphonia) taken from the VOICED database. These experiments revealed that the TQWT features yielded an absolute improvement of 5.5% and 4.5% compared to the baseline MFCC features and glottal features, respectively. Furthermore, the highest classification accuracy (67.91%) was obtained using the combination of the TQWT and glottal features, which indicates the complementary nature of these features.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"155 ","pages":"Article 102989"},"PeriodicalIF":2.4000,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Communication","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167639323001231","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Functional dysphonia (FD) refers to an abnormality in voice quality in the absence of an identifiable lesion. In this paper, we propose an approach based on the tunable Q wavelet transform (TQWT) to automatically classify two types of FD (hyperfunctional dysphonia and hypofunctional dysphonia) from a healthy voice using the acoustic voice signal. Using TQWT, voice signals were decomposed into sub-bands and the entropy values extracted from the sub-bands were utilized as features for the studied 3-class classification problem. In addition, the Mel-frequency cepstral coefficient (MFCC) and glottal features were extracted from the acoustic voice signal and the estimated glottal source signal, respectively. A convolutional neural network (CNN) classifier was trained separately for the TQWT, MFCC and glottal features. Experiments were conducted using voice signals of 57 healthy speakers and 113 FD patients (72 with hyperfunctional dysphonia and 41 with hypofunctional dysphonia) taken from the VOICED database. These experiments revealed that the TQWT features yielded an absolute improvement of 5.5% and 4.5% compared to the baseline MFCC features and glottal features, respectively. Furthermore, the highest classification accuracy (67.91%) was obtained using the combination of the TQWT and glottal features, which indicates the complementary nature of these features.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于可调Q小波变换的功能性语音障碍分类
功能性发音困难(FD)是指在没有可识别病变的情况下出现的语音质量异常。在本文中,我们提出了一种基于可调Q小波变换(TQWT)的方法,使用声学语音信号从健康语音中自动分类两种类型的FD(高功能性发音困难和低功能性发音障碍)。使用TQWT,将语音信号分解为子波段,并将从子波段提取的熵值用作所研究的3类分类问题的特征。此外,分别从声学语音信号和估计的声门源信号中提取Mel频率倒谱系数(MFCC)和声门特征。卷积神经网络(CNN)分类器分别针对TQWT、MFCC和声门特征进行训练。使用来自VOICED数据库的57名健康说话者和113名FD患者(72名患有高功能性发音困难,41名患有低功能性发音障碍)的语音信号进行实验。这些实验表明,与基线MFCC特征和声门特征相比,TQWT特征分别产生了5.5%和4.5%的绝对改善。此外,使用TQWT和声门特征的组合获得了最高的分类准确率(67.91%),这表明了这些特征的互补性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Speech Communication
Speech Communication 工程技术-计算机:跨学科应用
CiteScore
6.80
自引率
6.20%
发文量
94
审稿时长
19.2 weeks
期刊介绍: Speech Communication is an interdisciplinary journal whose primary objective is to fulfil the need for the rapid dissemination and thorough discussion of basic and applied research results. The journal''s primary objectives are: • to present a forum for the advancement of human and human-machine speech communication science; • to stimulate cross-fertilization between different fields of this domain; • to contribute towards the rapid and wide diffusion of scientifically sound contributions in this domain.
期刊最新文献
A new universal camouflage attack algorithm for intelligent speech system Fixed frequency range empirical wavelet transform based acoustic and entropy features for speech emotion recognition AFP-Conformer: Asymptotic feature pyramid conformer for spoofing speech detection A robust temporal map of speech monitoring from planning to articulation The combined effects of bilingualism and musicianship on listeners’ perception of non-native lexical tones
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1