Deep Learning-based Analysis of Voiceprint Data Mining

International Journal of Applied Science and Engineering Research Pub Date : 2022-01-30 DOI:10.56828/jser.2022.1.1.1

Jacky Chun-ki Tang

{"title":"Deep Learning-based Analysis of Voiceprint Data Mining","authors":"Jacky Chun-ki Tang","doi":"10.56828/jser.2022.1.1.1","DOIUrl":null,"url":null,"abstract":": In the information age, the intelligent data mining method represented by deep learning is playing an important role in various fields at present. It is necessary to study how to efficiently use the intelligent data mining method to obtain valuable information from massive information. Open-set voiceprint recognition is realized by intelligent data mining technology. Therefore, it is of great practical significance to achieve rapid and accurate identification of the speaker's identity. Because the traditional voiceprint recognition method has insufficient ability to distinguish the speakers inside and outside the set, it often leads to a high false recognition rate. Mining parameters containing more speakers’ personality characteristics and how to calculate the threshold become the bottleneck problems of open set voiceprint recognition. Therefore, this paper adopts the deep confidence network stacked by three layers of restricted Boltzmann machines as the deep acoustic feature extractor. The mel-frequency cepstral coefficients of 24-dimensional basic acoustic features are mapped to 256-dimensional feature space, and the parameters of deep acoustic features containing more speaker's personality characteristics are obtained. Then, an open-set adaptive threshold calculation algorithm is obtained. In this paper, the similarity value of deep acoustic features is calculated by the Gaussian mixture model, and the maximum inter-class variance of the similarity value is calculated by the OTSU algorithm. When the inter-class variance is the maximum, the similarity value is the best threshold. The experimental test shows that the algorithm for calculating threshold based on deep learning proposed in this paper has a lower false rejection rate and lower false rejection rate.","PeriodicalId":13763,"journal":{"name":"International Journal of Applied Science and Engineering Research","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Applied Science and Engineering Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.56828/jser.2022.1.1.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

: In the information age, the intelligent data mining method represented by deep learning is playing an important role in various fields at present. It is necessary to study how to efficiently use the intelligent data mining method to obtain valuable information from massive information. Open-set voiceprint recognition is realized by intelligent data mining technology. Therefore, it is of great practical significance to achieve rapid and accurate identification of the speaker's identity. Because the traditional voiceprint recognition method has insufficient ability to distinguish the speakers inside and outside the set, it often leads to a high false recognition rate. Mining parameters containing more speakers’ personality characteristics and how to calculate the threshold become the bottleneck problems of open set voiceprint recognition. Therefore, this paper adopts the deep confidence network stacked by three layers of restricted Boltzmann machines as the deep acoustic feature extractor. The mel-frequency cepstral coefficients of 24-dimensional basic acoustic features are mapped to 256-dimensional feature space, and the parameters of deep acoustic features containing more speaker's personality characteristics are obtained. Then, an open-set adaptive threshold calculation algorithm is obtained. In this paper, the similarity value of deep acoustic features is calculated by the Gaussian mixture model, and the maximum inter-class variance of the similarity value is calculated by the OTSU algorithm. When the inter-class variance is the maximum, the similarity value is the best threshold. The experimental test shows that the algorithm for calculating threshold based on deep learning proposed in this paper has a lower false rejection rate and lower false rejection rate.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于深度学习的声纹数据挖掘分析

在信息时代，以深度学习为代表的智能数据挖掘方法目前在各个领域发挥着重要作用。研究如何有效地利用智能数据挖掘方法，从海量信息中获取有价值的信息是十分必要的。采用智能数据挖掘技术实现开放集声纹识别。因此，实现对说话人身份的快速准确识别具有重要的现实意义。由于传统声纹识别方法对会场内外说话人的区分能力不足，往往会导致较高的误识别率。挖掘包含更多说话人个性特征的参数以及如何计算阈值成为开放集声纹识别的瓶颈问题。因此，本文采用由三层受限玻尔兹曼机堆叠的深度置信网络作为深度声学特征提取器。将24维基本声学特征的梅尔频倒谱系数映射到256维特征空间，得到包含更多说话人个性特征的深层声学特征参数。然后，给出了一种开集自适应阈值计算算法。本文采用高斯混合模型计算深层声学特征的相似值，并采用OTSU算法计算相似值的最大类间方差。当类间方差最大时，相似性值为最佳阈值。实验测试表明，本文提出的基于深度学习的阈值计算算法具有较低的误拒率和较低的误拒率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Journal of Applied Science and Engineering Research

自引率

0.00%

发文量