Short Utterance Speaker Recognition Based on Speech High Frequency Information Compensation and Dynamic Feature Enhancement Methods

Pub Date : 2024-03-19 DOI:10.24425/aoa.2024.148768

Yunfei Zi, Shengwu Xiong

{"title":"Short Utterance Speaker Recognition Based on Speech High Frequency Information Compensation and Dynamic Feature Enhancement Methods","authors":"Yunfei Zi, Shengwu Xiong","doi":"10.24425/aoa.2024.148768","DOIUrl":null,"url":null,"abstract":"This work aims to further compensate for the weaknesses of feature sparsity and insufficient discriminative acoustic features in existing short-duration speaker recognition. To address this issue, we propose the Bark-scaled Gauss and the linear filter bank superposition cepstral coefficients (BGLCC), and the multidimensional central difference (MDCD) acoustic feature extracted method. The Bark-scaled Gauss filter bank focuses on low-frequency information, while linear filtering is uniformly distributed, therefore, the filter superposition can obtain more discriminative and richer acoustic features of short-duration audio signals. In addition, the multi-dimensional central difference method captures better dynamics features of speakers for improving the performance of short utterance speaker verification. Extensive experiments are conducted on short-duration text-independent speaker verification datasets generated from the VoxCeleb, SITW, and NIST SRE corpora, respectively, which contain speech samples of diverse lengths, and different scenarios. The results demonstrate that the proposed method outperforms the existing acoustic feature extraction approach by at least 10% in the test set. The ablation experiments further illustrate that our proposed approaches can achieve substantial improvement over prior methods.","PeriodicalId":0,"journal":{"name":"","volume":"45 10","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.24425/aoa.2024.148768","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This work aims to further compensate for the weaknesses of feature sparsity and insufficient discriminative acoustic features in existing short-duration speaker recognition. To address this issue, we propose the Bark-scaled Gauss and the linear filter bank superposition cepstral coefficients (BGLCC), and the multidimensional central difference (MDCD) acoustic feature extracted method. The Bark-scaled Gauss filter bank focuses on low-frequency information, while linear filtering is uniformly distributed, therefore, the filter superposition can obtain more discriminative and richer acoustic features of short-duration audio signals. In addition, the multi-dimensional central difference method captures better dynamics features of speakers for improving the performance of short utterance speaker verification. Extensive experiments are conducted on short-duration text-independent speaker verification datasets generated from the VoxCeleb, SITW, and NIST SRE corpora, respectively, which contain speech samples of diverse lengths, and different scenarios. The results demonstrate that the proposed method outperforms the existing acoustic feature extraction approach by at least 10% in the test set. The ablation experiments further illustrate that our proposed approaches can achieve substantial improvement over prior methods.

查看原文

微信好友朋友圈 QQ好友复制链接

基于语音高频信息补偿和动态特征增强方法的短语说话人识别方法

本研究旨在进一步弥补现有短时说话人识别中特征稀疏和声学特征区分度不足的缺点。针对这一问题，我们提出了巴氏高斯滤波器组和线性滤波器组的叠加共振频率系数（BGLCC）以及多维中心差分（MDCD）声学特征提取方法。巴克标度高斯滤波器组侧重于低频信息，而线性滤波是均匀分布的，因此，滤波器叠加可以获得更有辨别力、更丰富的短时音频信号声学特征。此外，多维中心差分法能更好地捕捉说话人的动态特征，从而提高短时语音说话人验证的性能。实验分别在 VoxCeleb、SITW 和 NIST SRE 语料库中生成的与文本无关的短时语音验证数据集上进行，这些数据集包含不同长度和不同场景的语音样本。结果表明，在测试集中，所提出的方法比现有的声学特征提取方法至少优胜 10%。消融实验进一步说明，我们提出的方法可以比以前的方法实现大幅改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助