Short Utterance Speaker Recognition Based on Speech High Frequency Information Compensation and Dynamic Feature Enhancement Methods

IF 0.6 4区 物理与天体物理 Q4 ACOUSTICS Archives of Acoustics Pub Date : 2024-03-19 DOI:10.24425/aoa.2024.148768
Yunfei Zi, Shengwu Xiong
{"title":"Short Utterance Speaker Recognition Based on Speech High Frequency Information Compensation and Dynamic Feature Enhancement Methods","authors":"Yunfei Zi, Shengwu Xiong","doi":"10.24425/aoa.2024.148768","DOIUrl":null,"url":null,"abstract":"This work aims to further compensate for the weaknesses of feature sparsity and insufficient discriminative acoustic features in existing short-duration speaker recognition. To address this issue, we propose the Bark-scaled Gauss and the linear filter bank superposition cepstral coefficients (BGLCC), and the multidimensional central difference (MDCD) acoustic feature extracted method. The Bark-scaled Gauss filter bank focuses on low-frequency information, while linear filtering is uniformly distributed, therefore, the filter superposition can obtain more discriminative and richer acoustic features of short-duration audio signals. In addition, the multi-dimensional central difference method captures better dynamics features of speakers for improving the performance of short utterance speaker verification. Extensive experiments are conducted on short-duration text-independent speaker verification datasets generated from the VoxCeleb, SITW, and NIST SRE corpora, respectively, which contain speech samples of diverse lengths, and different scenarios. The results demonstrate that the proposed method outperforms the existing acoustic feature extraction approach by at least 10% in the test set. The ablation experiments further illustrate that our proposed approaches can achieve substantial improvement over prior methods.","PeriodicalId":8149,"journal":{"name":"Archives of Acoustics","volume":null,"pages":null},"PeriodicalIF":0.6000,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Archives of Acoustics","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.24425/aoa.2024.148768","RegionNum":4,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

Abstract

This work aims to further compensate for the weaknesses of feature sparsity and insufficient discriminative acoustic features in existing short-duration speaker recognition. To address this issue, we propose the Bark-scaled Gauss and the linear filter bank superposition cepstral coefficients (BGLCC), and the multidimensional central difference (MDCD) acoustic feature extracted method. The Bark-scaled Gauss filter bank focuses on low-frequency information, while linear filtering is uniformly distributed, therefore, the filter superposition can obtain more discriminative and richer acoustic features of short-duration audio signals. In addition, the multi-dimensional central difference method captures better dynamics features of speakers for improving the performance of short utterance speaker verification. Extensive experiments are conducted on short-duration text-independent speaker verification datasets generated from the VoxCeleb, SITW, and NIST SRE corpora, respectively, which contain speech samples of diverse lengths, and different scenarios. The results demonstrate that the proposed method outperforms the existing acoustic feature extraction approach by at least 10% in the test set. The ablation experiments further illustrate that our proposed approaches can achieve substantial improvement over prior methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于语音高频信息补偿和动态特征增强方法的短语说话人识别方法
本研究旨在进一步弥补现有短时说话人识别中特征稀疏和声学特征区分度不足的缺点。针对这一问题,我们提出了巴氏高斯滤波器组和线性滤波器组的叠加共振频率系数(BGLCC)以及多维中心差分(MDCD)声学特征提取方法。巴克标度高斯滤波器组侧重于低频信息,而线性滤波是均匀分布的,因此,滤波器叠加可以获得更有辨别力、更丰富的短时音频信号声学特征。此外,多维中心差分法能更好地捕捉说话人的动态特征,从而提高短时语音说话人验证的性能。实验分别在 VoxCeleb、SITW 和 NIST SRE 语料库中生成的与文本无关的短时语音验证数据集上进行,这些数据集包含不同长度和不同场景的语音样本。结果表明,在测试集中,所提出的方法比现有的声学特征提取方法至少优胜 10%。消融实验进一步说明,我们提出的方法可以比以前的方法实现大幅改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Archives of Acoustics
Archives of Acoustics 物理-声学
CiteScore
1.80
自引率
11.10%
发文量
0
审稿时长
6-12 weeks
期刊介绍: Archives of Acoustics, the peer-reviewed quarterly journal publishes original research papers from all areas of acoustics like: acoustical measurements and instrumentation, acoustics of musics, acousto-optics, architectural, building and environmental acoustics, bioacoustics, electroacoustics, linear and nonlinear acoustics, noise and vibration, physical and chemical effects of sound, physiological acoustics, psychoacoustics, quantum acoustics, speech processing and communication systems, speech production and perception, transducers, ultrasonics, underwater acoustics.
期刊最新文献
Assessing Spatial Audio: A Listener-Centric Case Study on Object-Based and Ambisonic Audio Processing Influence of Ultrasonic Cavitation on Botryococcus Braunii Growth Modelling the Acoustic Properties of Baffles Made of Porous and Fibrous Materials Study on the Impact of Drainage Noise in Residential Bathrooms Based on Finite Element Simulation An Algorithm for Ultrasonic Identification of Ceramic Materials and Virtual Prototype Realization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1