首页 > 最新文献

International Journal of Speech Technology最新文献

英文 中文
Fusion of speech and handwritten signatures biometrics for person identification 融合语音和手写签名的生物识别技术
Q1 Arts and Humanities Pub Date : 2023-11-01 DOI: 10.1007/s10772-023-10052-x
Ahmad A. M. Abushariah, Mohammad A. M. Abushariah, Teddy Surya Gunawan, J. Chebil, Assal A. M. Alqudah, Hua-Nong Ting, Mumtaz Begum Peer Mustafa
{"title":"Fusion of speech and handwritten signatures biometrics for person identification","authors":"Ahmad A. M. Abushariah, Mohammad A. M. Abushariah, Teddy Surya Gunawan, J. Chebil, Assal A. M. Alqudah, Hua-Nong Ting, Mumtaz Begum Peer Mustafa","doi":"10.1007/s10772-023-10052-x","DOIUrl":"https://doi.org/10.1007/s10772-023-10052-x","url":null,"abstract":"","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135325537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A hybrid adaptive neuro-fuzzy approach for automatic spoken digit recognition 语音数字自动识别的混合自适应神经模糊方法
Q1 Arts and Humanities Pub Date : 2023-10-31 DOI: 10.1007/s10772-023-10057-6
Irshed Hussain, Pinki Roy
{"title":"A hybrid adaptive neuro-fuzzy approach for automatic spoken digit recognition","authors":"Irshed Hussain, Pinki Roy","doi":"10.1007/s10772-023-10057-6","DOIUrl":"https://doi.org/10.1007/s10772-023-10057-6","url":null,"abstract":"","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":"55 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135872500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-task learning for X-vector based speaker recognition 基于x向量的说话人识别多任务学习
Q1 Arts and Humanities Pub Date : 2023-10-28 DOI: 10.1007/s10772-023-10058-5
Yingjie Zhang, Liu Liu
Abstract In this paper, we propose a speaker recognition system that leverages multi-task learning and features integration (MTFI), to improve the performance of x-vector based speaker recognition models. It is important to integrate complementary information from different features such as MFCC, Fbank, spectrogram and LPCC, as often a single feature usually cannot cover all information about a speaker and generalization is insufficient. Since the x-vector model outputs affine transformation values with the penultimate hidden layer in the trained model, the parameter distribution of this layer should be stable and should not be affected by tasks that are not current branches when switching tasks. Therefore, we propose a shared unit (SU) in multi-task learning, which is useful for sharing common representations and other auxiliary tasks. Then, an attention mechanism is designed to calculate the frame weight in the statistical pooling layer, so as to enhance the key frame information. The proposed system had an EER of 0.98% in voxceleb1 and the average score fusion obtained the EER of 0.65%.
摘要本文提出了一种基于多任务学习和特征集成(MTFI)的说话人识别系统,以提高基于x向量的说话人识别模型的性能。由于单个特征通常不能涵盖说话人的所有信息,泛化是不够的,因此整合来自不同特征(如MFCC、Fbank、频谱图和LPCC)的互补信息非常重要。由于x向量模型输出的是训练模型中倒数第二隐层的仿射变换值,所以在切换任务时,该层的参数分布应该是稳定的,不应该受到非当前支路任务的影响。因此,我们提出了多任务学习中的共享单元(SU),它有助于共享公共表征和其他辅助任务。然后,设计了一种关注机制来计算统计池层的帧权值,以增强关键帧信息。该系统在voxceleb1中的EER为0.98%,平均分数融合的EER为0.65%。
{"title":"Multi-task learning for X-vector based speaker recognition","authors":"Yingjie Zhang, Liu Liu","doi":"10.1007/s10772-023-10058-5","DOIUrl":"https://doi.org/10.1007/s10772-023-10058-5","url":null,"abstract":"Abstract In this paper, we propose a speaker recognition system that leverages multi-task learning and features integration (MTFI), to improve the performance of x-vector based speaker recognition models. It is important to integrate complementary information from different features such as MFCC, Fbank, spectrogram and LPCC, as often a single feature usually cannot cover all information about a speaker and generalization is insufficient. Since the x-vector model outputs affine transformation values with the penultimate hidden layer in the trained model, the parameter distribution of this layer should be stable and should not be affected by tasks that are not current branches when switching tasks. Therefore, we propose a shared unit (SU) in multi-task learning, which is useful for sharing common representations and other auxiliary tasks. Then, an attention mechanism is designed to calculate the frame weight in the statistical pooling layer, so as to enhance the key frame information. The proposed system had an EER of 0.98% in voxceleb1 and the average score fusion obtained the EER of 0.65%.","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136158369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised spoken term discovery using pseudo lexical induction 使用伪词法归纳的无监督口语术语发现
Q1 Arts and Humanities Pub Date : 2023-10-26 DOI: 10.1007/s10772-023-10049-6
P. Sudhakar, K. Sreenivasa Rao, Pabitra Mitra
{"title":"Unsupervised spoken term discovery using pseudo lexical induction","authors":"P. Sudhakar, K. Sreenivasa Rao, Pabitra Mitra","doi":"10.1007/s10772-023-10049-6","DOIUrl":"https://doi.org/10.1007/s10772-023-10049-6","url":null,"abstract":"","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134910240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bird species recognition using spiking neural network along with distance based fuzzy co-clustering 基于脉冲神经网络和基于距离的模糊共聚类的鸟类物种识别
Q1 Arts and Humanities Pub Date : 2023-09-13 DOI: 10.1007/s10772-023-10040-1
Ricky Mohanty, Hemanta Kumar Bhuyan, Subhendu Kumar Pani, Vinayakumar Ravi, Moez Krichen
{"title":"Bird species recognition using spiking neural network along with distance based fuzzy co-clustering","authors":"Ricky Mohanty, Hemanta Kumar Bhuyan, Subhendu Kumar Pani, Vinayakumar Ravi, Moez Krichen","doi":"10.1007/s10772-023-10040-1","DOIUrl":"https://doi.org/10.1007/s10772-023-10040-1","url":null,"abstract":"","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135781930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards modeling raw speech in gender identification of children using sincNet over ERB scale 在ERB量表上使用sincNet对儿童性别识别中的原始语音建模
Q1 Arts and Humanities Pub Date : 2023-09-08 DOI: 10.1007/s10772-023-10039-8
Kodali Radha, Mohan Bansal
{"title":"Towards modeling raw speech in gender identification of children using sincNet over ERB scale","authors":"Kodali Radha, Mohan Bansal","doi":"10.1007/s10772-023-10039-8","DOIUrl":"https://doi.org/10.1007/s10772-023-10039-8","url":null,"abstract":"","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46612698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application of probabilistic neural network for speech emotion recognition 概率神经网络在语音情感识别中的应用
Q1 Arts and Humanities Pub Date : 2023-09-06 DOI: 10.1007/s10772-023-10037-w
Shrikala Deshmukh, Preeti Gupta
{"title":"Application of probabilistic neural network for speech emotion recognition","authors":"Shrikala Deshmukh, Preeti Gupta","doi":"10.1007/s10772-023-10037-w","DOIUrl":"https://doi.org/10.1007/s10772-023-10037-w","url":null,"abstract":"","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47314222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic age recognition, call-type classification, and speaker identification of Zebra Finches (Taeniopygia guttata) using hidden Markov models (HMMs) 使用隐马尔可夫模型(HMM)对斑蝥的自动年龄识别、叫声类型分类和说话人识别
Q1 Arts and Humanities Pub Date : 2023-09-04 DOI: 10.1007/s10772-023-10041-0
Marek B. Trawicki
{"title":"Automatic age recognition, call-type classification, and speaker identification of Zebra Finches (Taeniopygia guttata) using hidden Markov models (HMMs)","authors":"Marek B. Trawicki","doi":"10.1007/s10772-023-10041-0","DOIUrl":"https://doi.org/10.1007/s10772-023-10041-0","url":null,"abstract":"","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47095049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Speech signal analysis and enhancement using combined wavelet Fourier transform with stacked deep learning architecture 基于小波傅里叶变换和堆叠深度学习结构的语音信号分析与增强
Q1 Arts and Humanities Pub Date : 2023-09-01 DOI: 10.1007/s10772-023-10044-x
V. Srinivasarao
{"title":"Speech signal analysis and enhancement using combined wavelet Fourier transform with stacked deep learning architecture","authors":"V. Srinivasarao","doi":"10.1007/s10772-023-10044-x","DOIUrl":"https://doi.org/10.1007/s10772-023-10044-x","url":null,"abstract":"","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135641306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep learning structure for emotion prediction using MFCC from native languages 使用来自母语的MFCC进行情感预测的深度学习结构
Q1 Arts and Humanities Pub Date : 2023-09-01 DOI: 10.1007/s10772-023-10047-8
A. Suresh Rao, A. Pramod Reddy, Pragathi Vulpala, K. Shwetha Rani, P. Hemalatha
{"title":"Deep learning structure for emotion prediction using MFCC from native languages","authors":"A. Suresh Rao, A. Pramod Reddy, Pragathi Vulpala, K. Shwetha Rani, P. Hemalatha","doi":"10.1007/s10772-023-10047-8","DOIUrl":"https://doi.org/10.1007/s10772-023-10047-8","url":null,"abstract":"","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135640647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Speech Technology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1