Closed-set automatic speaker identification using multi-scale recurrent networks in non-native children.

Kodali Radha, Mohan Bansal
{"title":"Closed-set automatic speaker identification using multi-scale recurrent networks in non-native children.","authors":"Kodali Radha,&nbsp;Mohan Bansal","doi":"10.1007/s41870-023-01224-8","DOIUrl":null,"url":null,"abstract":"<p><p>Children may benefit from automatic speaker identification in a variety of applications, including child security, safety, and education. The key focus of this study is to develop a closed-set child speaker identification system for non-native speakers of English in both text-dependent and text-independent speech tasks in order to track how the speaker's fluency affects the system. The multi-scale wavelet scattering transform is used to compensate for concerns like the loss of high-frequency information caused by the most widely used mel frequency cepstral coefficients feature extractor. The proposed large-scale speaker identification system succeeds well by employing wavelet scattered Bi-LSTM. While this procedure is used to identify non-native children in multiple classes, average values of accuracy, precision, recall, and F-measure are being used to assess the performance of the model in text-independent and text-dependent tasks, which outperforms the existing models.</p>","PeriodicalId":73455,"journal":{"name":"International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10023307/pdf/","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41870-023-01224-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Children may benefit from automatic speaker identification in a variety of applications, including child security, safety, and education. The key focus of this study is to develop a closed-set child speaker identification system for non-native speakers of English in both text-dependent and text-independent speech tasks in order to track how the speaker's fluency affects the system. The multi-scale wavelet scattering transform is used to compensate for concerns like the loss of high-frequency information caused by the most widely used mel frequency cepstral coefficients feature extractor. The proposed large-scale speaker identification system succeeds well by employing wavelet scattered Bi-LSTM. While this procedure is used to identify non-native children in multiple classes, average values of accuracy, precision, recall, and F-measure are being used to assess the performance of the model in text-independent and text-dependent tasks, which outperforms the existing models.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于多尺度循环网络的非母语儿童闭集自动说话人识别。
儿童可以从自动说话人识别在各种应用中受益,包括儿童安全,安全和教育。本研究的重点是为非英语母语者在文本依赖和文本独立语音任务中开发一个封闭集儿童说话人识别系统,以跟踪说话人的流利程度如何影响该系统。采用多尺度小波散射变换弥补了目前应用最广泛的mel频率倒谱系数特征提取器所造成的高频信息丢失等问题。采用小波散射双lstm方法的大规模说话人识别系统取得了良好的效果。虽然该过程用于识别多个班级中的非母语儿童,但准确性、精度、召回率和F-measure的平均值被用于评估模型在文本独立和文本依赖任务中的性能,其性能优于现有模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Convolutional neural network based children recognition system using contactless fingerprints. On utilizing modified TOPSIS with R-norm q-rung picture fuzzy information measure green supplier selection. Adoption of machine learning algorithm for predicting the length of stay of patients (construction workers) during COVID pandemic. Adoption and sustainability of bitcoin and the blockchain technology in Nigeria. Debunking multi-lingual social media posts using deep learning.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1