Deep Learning-based Analysis of Voiceprint Data Mining

Jacky Chun-ki Tang
{"title":"Deep Learning-based Analysis of Voiceprint Data Mining","authors":"Jacky Chun-ki Tang","doi":"10.56828/jser.2022.1.1.1","DOIUrl":null,"url":null,"abstract":": In the information age, the intelligent data mining method represented by deep learning is playing an important role in various fields at present. It is necessary to study how to efficiently use the intelligent data mining method to obtain valuable information from massive information. Open-set voiceprint recognition is realized by intelligent data mining technology. Therefore, it is of great practical significance to achieve rapid and accurate identification of the speaker's identity. Because the traditional voiceprint recognition method has insufficient ability to distinguish the speakers inside and outside the set, it often leads to a high false recognition rate. Mining parameters containing more speakers’ personality characteristics and how to calculate the threshold become the bottleneck problems of open set voiceprint recognition. Therefore, this paper adopts the deep confidence network stacked by three layers of restricted Boltzmann machines as the deep acoustic feature extractor. The mel-frequency cepstral coefficients of 24-dimensional basic acoustic features are mapped to 256-dimensional feature space, and the parameters of deep acoustic features containing more speaker's personality characteristics are obtained. Then, an open-set adaptive threshold calculation algorithm is obtained. In this paper, the similarity value of deep acoustic features is calculated by the Gaussian mixture model, and the maximum inter-class variance of the similarity value is calculated by the OTSU algorithm. When the inter-class variance is the maximum, the similarity value is the best threshold. The experimental test shows that the algorithm for calculating threshold based on deep learning proposed in this paper has a lower false rejection rate and lower false rejection rate.","PeriodicalId":13763,"journal":{"name":"International Journal of Applied Science and Engineering Research","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Applied Science and Engineering Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.56828/jser.2022.1.1.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

: In the information age, the intelligent data mining method represented by deep learning is playing an important role in various fields at present. It is necessary to study how to efficiently use the intelligent data mining method to obtain valuable information from massive information. Open-set voiceprint recognition is realized by intelligent data mining technology. Therefore, it is of great practical significance to achieve rapid and accurate identification of the speaker's identity. Because the traditional voiceprint recognition method has insufficient ability to distinguish the speakers inside and outside the set, it often leads to a high false recognition rate. Mining parameters containing more speakers’ personality characteristics and how to calculate the threshold become the bottleneck problems of open set voiceprint recognition. Therefore, this paper adopts the deep confidence network stacked by three layers of restricted Boltzmann machines as the deep acoustic feature extractor. The mel-frequency cepstral coefficients of 24-dimensional basic acoustic features are mapped to 256-dimensional feature space, and the parameters of deep acoustic features containing more speaker's personality characteristics are obtained. Then, an open-set adaptive threshold calculation algorithm is obtained. In this paper, the similarity value of deep acoustic features is calculated by the Gaussian mixture model, and the maximum inter-class variance of the similarity value is calculated by the OTSU algorithm. When the inter-class variance is the maximum, the similarity value is the best threshold. The experimental test shows that the algorithm for calculating threshold based on deep learning proposed in this paper has a lower false rejection rate and lower false rejection rate.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于深度学习的声纹数据挖掘分析
在信息时代,以深度学习为代表的智能数据挖掘方法目前在各个领域发挥着重要作用。研究如何有效地利用智能数据挖掘方法,从海量信息中获取有价值的信息是十分必要的。采用智能数据挖掘技术实现开放集声纹识别。因此,实现对说话人身份的快速准确识别具有重要的现实意义。由于传统声纹识别方法对会场内外说话人的区分能力不足,往往会导致较高的误识别率。挖掘包含更多说话人个性特征的参数以及如何计算阈值成为开放集声纹识别的瓶颈问题。因此,本文采用由三层受限玻尔兹曼机堆叠的深度置信网络作为深度声学特征提取器。将24维基本声学特征的梅尔频倒谱系数映射到256维特征空间,得到包含更多说话人个性特征的深层声学特征参数。然后,给出了一种开集自适应阈值计算算法。本文采用高斯混合模型计算深层声学特征的相似值,并采用OTSU算法计算相似值的最大类间方差。当类间方差最大时,相似性值为最佳阈值。实验测试表明,本文提出的基于深度学习的阈值计算算法具有较低的误拒率和较低的误拒率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Sartrean Freedom and his Idea of Concrete Liberalism Direct Product of Finite Abelian Group Petrographic Characteristic and Geochemical features of Basement Rocks in Ikogosi, Southwestern Nigeria. IMPROVEMENT OF LATERITIC GRAVELLY SOILS IN ROAD CONSTRUCTION: PARTIAL SUBSTITUTION OF CEMENT BY GRANITE POWDER AN EXPLORATORY STUDY OF PREDICTORS OF COMPLIANCE WITH OCCUPATIONAL HEALTH AND SAFETY GUIDELINES IN EMERGING BUSINESS ENTERPRISES
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1