Pub Date : 2023-09-01DOI: 10.1007/s10772-023-10043-y
James A. Rodger, Justin Piper
{"title":"Assessing American presidential candidates using principles of ontological engineering, word sense disambiguation, data envelope analysis and qualitative comparative analysis","authors":"James A. Rodger, Justin Piper","doi":"10.1007/s10772-023-10043-y","DOIUrl":"https://doi.org/10.1007/s10772-023-10043-y","url":null,"abstract":"","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135686476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Many application areas, such as background identification, predictive maintenance in industrial applications, smart home applications, assisting deaf people with their daily activities and indexing and retrieval of content-based multimedia, etc., use automatic background classification using speech signals. It is challenging to predict the background environment accurately from speech signal information. Thus, a novel synchrosqueezed wavelet transform (SWT)-based deep learning (DL) approach is proposed in this paper for automatically classifying background information embedded in speech signals. Here, SWT is incorporated to obtain the time-frequency plot from the speech signals. These time-frequency signals are then fed to a deep convolutional neural network (DCNN) to classify background information embedded in speech signals. The proposed DCNN model consists of three convolution layers, one batch-normalization layer, three max-pooling layers, one dropout layer, and one fully connected layer. The proposed method is tested using various background signals embedded in speech signals, such as airport, airplane, drone, street, babble, car, helicopter, exhibition, station, restaurant, and train sounds. According to the results, the proposed SWT-based DCNN approach has an overall classification accuracy of 97.96 (± 0.53)% to classify background information embedded in speech signals. Finally, the performance of the proposed approach is compared to the existing methods.
{"title":"Time frequency domain deep CNN for automatic background classification in speech signals","authors":"Rakesh Reddy Yakkati, Sreenivasa Reddy Yeduri, Rajesh Kumar Tripathy, Linga Reddy Cenkeramaddi","doi":"10.1007/s10772-023-10042-z","DOIUrl":"https://doi.org/10.1007/s10772-023-10042-z","url":null,"abstract":"Abstract Many application areas, such as background identification, predictive maintenance in industrial applications, smart home applications, assisting deaf people with their daily activities and indexing and retrieval of content-based multimedia, etc., use automatic background classification using speech signals. It is challenging to predict the background environment accurately from speech signal information. Thus, a novel synchrosqueezed wavelet transform (SWT)-based deep learning (DL) approach is proposed in this paper for automatically classifying background information embedded in speech signals. Here, SWT is incorporated to obtain the time-frequency plot from the speech signals. These time-frequency signals are then fed to a deep convolutional neural network (DCNN) to classify background information embedded in speech signals. The proposed DCNN model consists of three convolution layers, one batch-normalization layer, three max-pooling layers, one dropout layer, and one fully connected layer. The proposed method is tested using various background signals embedded in speech signals, such as airport, airplane, drone, street, babble, car, helicopter, exhibition, station, restaurant, and train sounds. According to the results, the proposed SWT-based DCNN approach has an overall classification accuracy of 97.96 (± 0.53)% to classify background information embedded in speech signals. Finally, the performance of the proposed approach is compared to the existing methods.","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":"2641 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135346603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01DOI: 10.1007/s10772-023-10036-x
Heiner Ludwig, Thorsten Schmidt, Mathias Kühn
{"title":"Voice user interfaces in manufacturing logistics: a literature review","authors":"Heiner Ludwig, Thorsten Schmidt, Mathias Kühn","doi":"10.1007/s10772-023-10036-x","DOIUrl":"https://doi.org/10.1007/s10772-023-10036-x","url":null,"abstract":"","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46875649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01DOI: 10.1007/s10772-023-10031-2
Eiman Alsharhan, Allan Ramsay
{"title":"Robust automatic accent identification based on the acoustic evidence","authors":"Eiman Alsharhan, Allan Ramsay","doi":"10.1007/s10772-023-10031-2","DOIUrl":"https://doi.org/10.1007/s10772-023-10031-2","url":null,"abstract":"","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135248804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01DOI: 10.1007/s10772-023-10048-7
Khamis A. Al-Karawi, Duraid Y. Mohammed
{"title":"Using combined features to improve speaker verification in the face of limited reverberant data","authors":"Khamis A. Al-Karawi, Duraid Y. Mohammed","doi":"10.1007/s10772-023-10048-7","DOIUrl":"https://doi.org/10.1007/s10772-023-10048-7","url":null,"abstract":"","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135690314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01DOI: 10.1007/s10772-023-10046-9
Shivam Dwivedi, Sanjukta Ghosh, Satyam Dwivedi
{"title":"Binary classifier for identification of stammering instances in Hindi speech data","authors":"Shivam Dwivedi, Sanjukta Ghosh, Satyam Dwivedi","doi":"10.1007/s10772-023-10046-9","DOIUrl":"https://doi.org/10.1007/s10772-023-10046-9","url":null,"abstract":"","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135686866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01DOI: 10.1007/s10772-023-10054-9
Mohamed Daouad, Fadoua Ataa Allah, El Wardani Dadi
{"title":"An automatic speech recognition system for isolated Amazigh word using 1D & 2D CNN-LSTM architecture","authors":"Mohamed Daouad, Fadoua Ataa Allah, El Wardani Dadi","doi":"10.1007/s10772-023-10054-9","DOIUrl":"https://doi.org/10.1007/s10772-023-10054-9","url":null,"abstract":"","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135688149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-25DOI: 10.1007/s10772-023-10038-9
Adil Chakhtouna, Sara Sekkate, A. Adib
{"title":"Speaker and gender dependencies in within/cross linguistic Speech Emotion Recognition","authors":"Adil Chakhtouna, Sara Sekkate, A. Adib","doi":"10.1007/s10772-023-10038-9","DOIUrl":"https://doi.org/10.1007/s10772-023-10038-9","url":null,"abstract":"","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45961353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-01DOI: 10.1007/s10772-023-10035-y
Mohammed Tellai, L. Gao, Qi-rong Mao
{"title":"An efficient speech emotion recognition based on a dual-stream CNN-transformer fusion network","authors":"Mohammed Tellai, L. Gao, Qi-rong Mao","doi":"10.1007/s10772-023-10035-y","DOIUrl":"https://doi.org/10.1007/s10772-023-10035-y","url":null,"abstract":"","PeriodicalId":14305,"journal":{"name":"International Journal of Speech Technology","volume":"26 1","pages":"541-557"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43680205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}