One Voice is All You Need: A One-Shot Approach to Recognize Your Voice

Priata Nowshin, Shahriar Rumi Dipto, Intesur Ahmed, Deboraj Chowdhury, Galib Abdun Noor, Amitabha Chakrabarty, Muhammad Tahmeed Abdullah, Moshiur Rahman
{"title":"One Voice is All You Need: A One-Shot Approach to Recognize Your Voice","authors":"Priata Nowshin, Shahriar Rumi Dipto, Intesur Ahmed, Deboraj Chowdhury, Galib Abdun Noor, Amitabha Chakrabarty, Muhammad Tahmeed Abdullah, Moshiur Rahman","doi":"10.1109/CDMA54072.2022.00022","DOIUrl":null,"url":null,"abstract":"In the field of computer vision, one-shot learning has proven to be effective, as it works accurately with a single labeled training example and a small number of training sets. In one-shot learning, we must accurately make predictions based on only one sample of each new class. In this paper, we look at a strategy for learning Siamese neural networks that use a distinctive structure to automatically evaluate the similarity between inputs. The goal of this paper is to apply the concept of one-shot learning to audio classification by extracting specific features, where it uses triplet loss to train the model to learn through Siamese network and calculates the rate of similarity while testing via a support set and a query set. We have executed our experiment on LibriSpeech ASR corpus. We evaluated our work on N-way-1-shot learning and generated strong results for 2-way (100%), 3-way (95%), 4-way (84%), and 5-way (74%) that outperform existing machine learning models by a large margin. To the best of our knowledge, this may be the first paper to look at the possibility of one-shot human speech recognition on the LibriSpeech ASR corpus using the Siamese network.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CDMA54072.2022.00022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In the field of computer vision, one-shot learning has proven to be effective, as it works accurately with a single labeled training example and a small number of training sets. In one-shot learning, we must accurately make predictions based on only one sample of each new class. In this paper, we look at a strategy for learning Siamese neural networks that use a distinctive structure to automatically evaluate the similarity between inputs. The goal of this paper is to apply the concept of one-shot learning to audio classification by extracting specific features, where it uses triplet loss to train the model to learn through Siamese network and calculates the rate of similarity while testing via a support set and a query set. We have executed our experiment on LibriSpeech ASR corpus. We evaluated our work on N-way-1-shot learning and generated strong results for 2-way (100%), 3-way (95%), 4-way (84%), and 5-way (74%) that outperform existing machine learning models by a large margin. To the best of our knowledge, this may be the first paper to look at the possibility of one-shot human speech recognition on the LibriSpeech ASR corpus using the Siamese network.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一个声音就是你所需要的:一次识别你的声音的方法
在计算机视觉领域,单次学习已经被证明是有效的,因为它可以准确地使用单个标记的训练样例和少量的训练集。在单次学习中,我们必须根据每个新类的一个样本准确地做出预测。在本文中,我们研究了一种学习暹罗神经网络的策略,该网络使用独特的结构来自动评估输入之间的相似性。本文的目标是通过提取特定的特征,将一次性学习的概念应用到音频分类中,其中使用三重损失来训练模型通过Siamese网络学习,并在通过支持集和查询集进行测试时计算相似率。我们已经在librisspeech ASR语料库上进行了实验。我们评估了我们在N-way-1-shot学习方面的工作,并在2-way(100%)、3-way(95%)、4-way(84%)和5-way(74%)方面产生了强有力的结果,大大优于现有的机器学习模型。据我们所知,这可能是第一篇研究使用Siamese网络在librisspeech ASR语料库上进行一次性人类语音识别的可能性的论文。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The Accuracy Performance of Semantic Segmentation Network with Different Backbones On the Capabilities of Quantum Machine Learning Machine Learning Algorithms for Detection of Noisy/Artifact-Corrupted Epochs of Visual Oddball Paradigm ERP Data Deep Learning for Classifying of White Blood Cancer Machine Learning Based Preemptive Diagnosis of Lung Cancer Using Clinical Data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1