Robust text-dependent speaker verification system using gender aware Siamese-Triplet Deep Neural Network.

IF 1.1 3区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Network-Computation in Neural Systems Pub Date : 2024-12-29 DOI:10.1080/0954898X.2024.2438128

Sanghamitra V Arora

{"title":"Robust text-dependent speaker verification system using gender aware Siamese-Triplet Deep Neural Network.","authors":"Sanghamitra V Arora","doi":"10.1080/0954898X.2024.2438128","DOIUrl":null,"url":null,"abstract":"<p><p>Speaker verification in text-dependent scenarios is critical for high-security applications but faces challenges such as voice quality variations, linguistic diversity, and gender-related pitch differences, which affect authentication accuracy. This paper introduces a Gender-Aware Siamese-Triplet Network-Deep Neural Network (ST-DNN) architecture to address these challenges. The Gender-Aware Network utilizes Convolutional 2D layers with ReLU activation for initial feature extraction, followed by multi-fusion dense skip connections and batch normalization to integrate features across different depths, enhancing discrimination between male and female speakers. A bottleneck layer compresses feature maps to capture gender-related characteristics effectively. For enhanced speaker verification, separate male and female ST-DNN models are used, each incorporating Individual, Siamese, and Triplet Networks. The Individual Network extracts unique utterance characteristics, the Siamese Network compares speech sample pairs for speaker identity, and the Triplet Network ensures closely grouped embeddings of samples from the same speaker, facilitating precise verification. Experimental results on RSR2015 and RedDots Challenge 2016 datasets demonstrate significant improvements, with reductions in Equal Error Rate (EER) ranging from 32.31% to 54.55% for males and 33.73% to 38.98% for females, and reductions in MinDCF from 53.47% to 86.36% and 39.46% to 71.19%, respectively, validating the efficacy of the ST-DNN in real-world applications.</p>","PeriodicalId":54735,"journal":{"name":"Network-Computation in Neural Systems","volume":" ","pages":"1-40"},"PeriodicalIF":1.1000,"publicationDate":"2024-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Network-Computation in Neural Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1080/0954898X.2024.2438128","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Speaker verification in text-dependent scenarios is critical for high-security applications but faces challenges such as voice quality variations, linguistic diversity, and gender-related pitch differences, which affect authentication accuracy. This paper introduces a Gender-Aware Siamese-Triplet Network-Deep Neural Network (ST-DNN) architecture to address these challenges. The Gender-Aware Network utilizes Convolutional 2D layers with ReLU activation for initial feature extraction, followed by multi-fusion dense skip connections and batch normalization to integrate features across different depths, enhancing discrimination between male and female speakers. A bottleneck layer compresses feature maps to capture gender-related characteristics effectively. For enhanced speaker verification, separate male and female ST-DNN models are used, each incorporating Individual, Siamese, and Triplet Networks. The Individual Network extracts unique utterance characteristics, the Siamese Network compares speech sample pairs for speaker identity, and the Triplet Network ensures closely grouped embeddings of samples from the same speaker, facilitating precise verification. Experimental results on RSR2015 and RedDots Challenge 2016 datasets demonstrate significant improvements, with reductions in Equal Error Rate (EER) ranging from 32.31% to 54.55% for males and 33.73% to 38.98% for females, and reductions in MinDCF from 53.47% to 86.36% and 39.46% to 71.19%, respectively, validating the efficacy of the ST-DNN in real-world applications.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

求助全文

约1分钟内获得全文去求助

来源期刊

Network-Computation in Neural Systems 工程技术-工程：电子与电气

CiteScore

3.70

自引率

1.30%

发文量

审稿时长

>12 weeks

期刊介绍： Network: Computation in Neural Systems welcomes submissions of research papers that integrate theoretical neuroscience with experimental data, emphasizing the utilization of cutting-edge technologies. We invite authors and researchers to contribute their work in the following areas: Theoretical Neuroscience: This section encompasses neural network modeling approaches that elucidate brain function. Neural Networks in Data Analysis and Pattern Recognition: We encourage submissions exploring the use of neural networks for data analysis and pattern recognition, including but not limited to image analysis and speech processing applications. Neural Networks in Control Systems: This category encompasses the utilization of neural networks in control systems, including robotics, state estimation, fault detection, and diagnosis. Analysis of Neurophysiological Data: We invite submissions focusing on the analysis of neurophysiology data obtained from experimental studies involving animals. Analysis of Experimental Data on the Human Brain: This section includes papers analyzing experimental data from studies on the human brain, utilizing imaging techniques such as MRI, fMRI, EEG, and PET. Neurobiological Foundations of Consciousness: We encourage submissions exploring the neural bases of consciousness in the brain and its simulation in machines.