利用多纸张 LFCC 和语音转换数据增强技术推进针对构音障碍者的语音生物识别技术

IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS IEEE Transactions on Information Forensics and Security Pub Date : 2024-10-23 DOI:10.1109/TIFS.2024.3484661
Shinimol Salim;Waquar Ahmad
{"title":"利用多纸张 LFCC 和语音转换数据增强技术推进针对构音障碍者的语音生物识别技术","authors":"Shinimol Salim;Waquar Ahmad","doi":"10.1109/TIFS.2024.3484661","DOIUrl":null,"url":null,"abstract":"Patients with dysarthria and physical impairments face challenges with traditional user interfaces. An Automatic Speaker Verification (ASV) system can enhance accessibility by replacing complex authentication methods and enabling voice biometrics in various applications for patients with dysarthria. This study focuses on enhancing accessibility of patients with dysarthria through an ASV system. In this study, a noval low variance Multitaper Linear Frequency Cepstral Coefficients (MTLFCC) feature is proposed. An ASV system for patients with dysarthria is implemented using the voice conversion data augmentation within a DNN framework. An extensive analysis is conducted to compare various multitaper techniques and taper weight choices using the Thomson multitaper method, specifically verifying patients with dysarthria as speakers. The impact of voice conversion through a cycle-consistent generative adversarial network (Cycle GAN) is also examined by modifying the acoustic attributes of control speech to make it perceptually similar to dysarthria speech and its implications for dysarthria ASV. Furthermore, the system performance is analyzed for different severity level of dysarthria to gain insight into how the selected multitaper parameters influence the outcomes. This study pioneers the use of MTLFCC features for ASV in the context of dysarthria, offering a novel approach to improve accessibility for this group.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"19 ","pages":"10114-10129"},"PeriodicalIF":6.3000,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Advancing Voice Biometrics for Dysarthria Speakers Using Multitaper LFCC and Voice Conversion Data Augmentation\",\"authors\":\"Shinimol Salim;Waquar Ahmad\",\"doi\":\"10.1109/TIFS.2024.3484661\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Patients with dysarthria and physical impairments face challenges with traditional user interfaces. An Automatic Speaker Verification (ASV) system can enhance accessibility by replacing complex authentication methods and enabling voice biometrics in various applications for patients with dysarthria. This study focuses on enhancing accessibility of patients with dysarthria through an ASV system. In this study, a noval low variance Multitaper Linear Frequency Cepstral Coefficients (MTLFCC) feature is proposed. An ASV system for patients with dysarthria is implemented using the voice conversion data augmentation within a DNN framework. An extensive analysis is conducted to compare various multitaper techniques and taper weight choices using the Thomson multitaper method, specifically verifying patients with dysarthria as speakers. The impact of voice conversion through a cycle-consistent generative adversarial network (Cycle GAN) is also examined by modifying the acoustic attributes of control speech to make it perceptually similar to dysarthria speech and its implications for dysarthria ASV. Furthermore, the system performance is analyzed for different severity level of dysarthria to gain insight into how the selected multitaper parameters influence the outcomes. This study pioneers the use of MTLFCC features for ASV in the context of dysarthria, offering a novel approach to improve accessibility for this group.\",\"PeriodicalId\":13492,\"journal\":{\"name\":\"IEEE Transactions on Information Forensics and Security\",\"volume\":\"19 \",\"pages\":\"10114-10129\"},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2024-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Information Forensics and Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10731900/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10731900/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

摘要

有构音障碍和肢体障碍的患者面临着传统用户界面的挑战。自动语音验证(ASV)系统可以取代复杂的身份验证方法,在各种应用中使用语音生物识别技术,从而提高构音障碍患者的无障碍程度。本研究的重点是通过 ASV 系统提高构音障碍患者的无障碍性。在这项研究中,我们提出了一种新的低方差多锥体线性频率倒频谱系数(MTLFCC)特征。利用 DNN 框架内的语音转换数据增强功能,为构音障碍患者实现了 ASV 系统。通过广泛的分析,比较了各种多锥度技术和使用汤姆森多锥度方法的锥度权重选择,特别是将构音障碍患者作为扬声器进行验证。还通过修改控制语音的声学属性,使其在感知上与构音障碍语音相似,研究了通过循环一致性生成对抗网络(Cycle GAN)进行语音转换的影响及其对构音障碍 ASV 的影响。此外,还对不同严重程度的构音障碍进行了系统性能分析,以深入了解所选多合成参数对结果的影响。这项研究开创性地将 MTLFCC 特征用于构音障碍 ASV,为改善该群体的无障碍环境提供了一种新方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Advancing Voice Biometrics for Dysarthria Speakers Using Multitaper LFCC and Voice Conversion Data Augmentation
Patients with dysarthria and physical impairments face challenges with traditional user interfaces. An Automatic Speaker Verification (ASV) system can enhance accessibility by replacing complex authentication methods and enabling voice biometrics in various applications for patients with dysarthria. This study focuses on enhancing accessibility of patients with dysarthria through an ASV system. In this study, a noval low variance Multitaper Linear Frequency Cepstral Coefficients (MTLFCC) feature is proposed. An ASV system for patients with dysarthria is implemented using the voice conversion data augmentation within a DNN framework. An extensive analysis is conducted to compare various multitaper techniques and taper weight choices using the Thomson multitaper method, specifically verifying patients with dysarthria as speakers. The impact of voice conversion through a cycle-consistent generative adversarial network (Cycle GAN) is also examined by modifying the acoustic attributes of control speech to make it perceptually similar to dysarthria speech and its implications for dysarthria ASV. Furthermore, the system performance is analyzed for different severity level of dysarthria to gain insight into how the selected multitaper parameters influence the outcomes. This study pioneers the use of MTLFCC features for ASV in the context of dysarthria, offering a novel approach to improve accessibility for this group.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Information Forensics and Security
IEEE Transactions on Information Forensics and Security 工程技术-工程:电子与电气
CiteScore
14.40
自引率
7.40%
发文量
234
审稿时长
6.5 months
期刊介绍: The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features
期刊最新文献
Attackers Are Not the Same! Unveiling the Impact of Feature Distribution on Label Inference Attacks Backdoor Online Tracing With Evolving Graphs LHADRO: A Robust Control Framework for Autonomous Vehicles Under Cyber-Physical Attacks Towards Mobile Palmprint Recognition via Multi-view Hierarchical Graph Learning Succinct Hash-based Arbitrary-Range Proofs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1