Pub Date : 2023-11-01DOI: 10.1007/s10772-023-10052-x
Ahmad A. M. Abushariah, Mohammad A. M. Abushariah, Teddy Surya Gunawan, J. Chebil, Assal A. M. Alqudah, Hua-Nong Ting, Mumtaz Begum Peer Mustafa
{"title":"Fusion of speech and handwritten signatures biometrics for person identification","authors":"Ahmad A. M. Abushariah, Mohammad A. M. Abushariah, Teddy Surya Gunawan, J. Chebil, Assal A. M. Alqudah, Hua-Nong Ting, Mumtaz Begum Peer Mustafa","doi":"10.1007/s10772-023-10052-x","DOIUrl":"https://doi.org/10.1007/s10772-023-10052-x","url":null,"abstract":"","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135325537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-28DOI: 10.1007/s10772-023-10058-5
Yingjie Zhang, Liu Liu
Abstract In this paper, we propose a speaker recognition system that leverages multi-task learning and features integration (MTFI), to improve the performance of x-vector based speaker recognition models. It is important to integrate complementary information from different features such as MFCC, Fbank, spectrogram and LPCC, as often a single feature usually cannot cover all information about a speaker and generalization is insufficient. Since the x-vector model outputs affine transformation values with the penultimate hidden layer in the trained model, the parameter distribution of this layer should be stable and should not be affected by tasks that are not current branches when switching tasks. Therefore, we propose a shared unit (SU) in multi-task learning, which is useful for sharing common representations and other auxiliary tasks. Then, an attention mechanism is designed to calculate the frame weight in the statistical pooling layer, so as to enhance the key frame information. The proposed system had an EER of 0.98% in voxceleb1 and the average score fusion obtained the EER of 0.65%.
{"title":"Multi-task learning for X-vector based speaker recognition","authors":"Yingjie Zhang, Liu Liu","doi":"10.1007/s10772-023-10058-5","DOIUrl":"https://doi.org/10.1007/s10772-023-10058-5","url":null,"abstract":"Abstract In this paper, we propose a speaker recognition system that leverages multi-task learning and features integration (MTFI), to improve the performance of x-vector based speaker recognition models. It is important to integrate complementary information from different features such as MFCC, Fbank, spectrogram and LPCC, as often a single feature usually cannot cover all information about a speaker and generalization is insufficient. Since the x-vector model outputs affine transformation values with the penultimate hidden layer in the trained model, the parameter distribution of this layer should be stable and should not be affected by tasks that are not current branches when switching tasks. Therefore, we propose a shared unit (SU) in multi-task learning, which is useful for sharing common representations and other auxiliary tasks. Then, an attention mechanism is designed to calculate the frame weight in the statistical pooling layer, so as to enhance the key frame information. The proposed system had an EER of 0.98% in voxceleb1 and the average score fusion obtained the EER of 0.65%.","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136158369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-26DOI: 10.1007/s10772-023-10049-6
P. Sudhakar, K. Sreenivasa Rao, Pabitra Mitra
{"title":"Unsupervised spoken term discovery using pseudo lexical induction","authors":"P. Sudhakar, K. Sreenivasa Rao, Pabitra Mitra","doi":"10.1007/s10772-023-10049-6","DOIUrl":"https://doi.org/10.1007/s10772-023-10049-6","url":null,"abstract":"","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134910240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bird species recognition using spiking neural network along with distance based fuzzy co-clustering","authors":"Ricky Mohanty, Hemanta Kumar Bhuyan, Subhendu Kumar Pani, Vinayakumar Ravi, Moez Krichen","doi":"10.1007/s10772-023-10040-1","DOIUrl":"https://doi.org/10.1007/s10772-023-10040-1","url":null,"abstract":"","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135781930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-08DOI: 10.1007/s10772-023-10039-8
Kodali Radha, Mohan Bansal
{"title":"Towards modeling raw speech in gender identification of children using sincNet over ERB scale","authors":"Kodali Radha, Mohan Bansal","doi":"10.1007/s10772-023-10039-8","DOIUrl":"https://doi.org/10.1007/s10772-023-10039-8","url":null,"abstract":"","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46612698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-04DOI: 10.1007/s10772-023-10041-0
Marek B. Trawicki
{"title":"Automatic age recognition, call-type classification, and speaker identification of Zebra Finches (Taeniopygia guttata) using hidden Markov models (HMMs)","authors":"Marek B. Trawicki","doi":"10.1007/s10772-023-10041-0","DOIUrl":"https://doi.org/10.1007/s10772-023-10041-0","url":null,"abstract":"","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47095049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01DOI: 10.1007/s10772-023-10044-x
V. Srinivasarao
{"title":"Speech signal analysis and enhancement using combined wavelet Fourier transform with stacked deep learning architecture","authors":"V. Srinivasarao","doi":"10.1007/s10772-023-10044-x","DOIUrl":"https://doi.org/10.1007/s10772-023-10044-x","url":null,"abstract":"","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135641306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01DOI: 10.1007/s10772-023-10047-8
A. Suresh Rao, A. Pramod Reddy, Pragathi Vulpala, K. Shwetha Rani, P. Hemalatha
{"title":"Deep learning structure for emotion prediction using MFCC from native languages","authors":"A. Suresh Rao, A. Pramod Reddy, Pragathi Vulpala, K. Shwetha Rani, P. Hemalatha","doi":"10.1007/s10772-023-10047-8","DOIUrl":"https://doi.org/10.1007/s10772-023-10047-8","url":null,"abstract":"","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135640647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}