Autoregressive models have achieved good performance in the field of speech recognition. However, the autore-gressive model uses recursive decoding and beam search in the inference stage, which leads to its slow inference speed. On the other hand, the non-autoregressive model naturally cannot utilize the context since all tokens are output at one time. To solve this problem, we propose a position-dependent non-autoregressive model. And in order to make better use of contextual information, we propose a pre-trained language model for speech recognition, which is placed behind the non-autoregressive model as an error correction module. In this way, we exchanged a smaller amount of calculation for the improvement of the recognition rate. Our method not only greatly reduces the computational cost, but also maintains a good recognition rate. We tested our model on the public Chinese speech corpus AISHELL-1. Our model achieves a 6.5% character error rate while the real-time factor is only 0.0022, which is 1/17 of the autoregressive model.
{"title":"Non-Autoregressive Speech Recognition with Error Correction Module","authors":"Yukun Qian, Xuyi Zhuang, Zehua Zhang, Lianyu Zhou, Xu Lin, Mingjiang Wang","doi":"10.23919/APSIPAASC55919.2022.9980031","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980031","url":null,"abstract":"Autoregressive models have achieved good performance in the field of speech recognition. However, the autore-gressive model uses recursive decoding and beam search in the inference stage, which leads to its slow inference speed. On the other hand, the non-autoregressive model naturally cannot utilize the context since all tokens are output at one time. To solve this problem, we propose a position-dependent non-autoregressive model. And in order to make better use of contextual information, we propose a pre-trained language model for speech recognition, which is placed behind the non-autoregressive model as an error correction module. In this way, we exchanged a smaller amount of calculation for the improvement of the recognition rate. Our method not only greatly reduces the computational cost, but also maintains a good recognition rate. We tested our model on the public Chinese speech corpus AISHELL-1. Our model achieves a 6.5% character error rate while the real-time factor is only 0.0022, which is 1/17 of the autoregressive model.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"29 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120858836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-07DOI: 10.23919/APSIPAASC55919.2022.9979843
Shuhei Yamaji, Taishi Nakashima, Nobutaka Ono, Li Li, H. Kameoka
In this paper, we propose a new network training to improve the source separation performance of the fast multichannel variational autoencoder (FastMVAE) method. The FastMVAE method is very effective for supervised source separation. It also significantly reduces the processing time by replacing the backpropagation steps in the MVAE method with a single forward propagation of the encoder for estimating latent variables. In previous studies, the encoder is trained together with the decoder using clean speech. In contrast, in this study, we re-train only the encoder by using the mixed signals with the decoder fixed. More specifically, using the imperfectly separated signals obtained in the process of the source separation algorithm, we train the encoder to find the optimal latent variables that minimize the objective function for source separation. Experimental results show that the proposed method reduces the objective function at almost every iteration and achieves higher separation performance than the conventional method.
{"title":"Encoder Re-training with Mixture Signals on FastMVAE Method","authors":"Shuhei Yamaji, Taishi Nakashima, Nobutaka Ono, Li Li, H. Kameoka","doi":"10.23919/APSIPAASC55919.2022.9979843","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979843","url":null,"abstract":"In this paper, we propose a new network training to improve the source separation performance of the fast multichannel variational autoencoder (FastMVAE) method. The FastMVAE method is very effective for supervised source separation. It also significantly reduces the processing time by replacing the backpropagation steps in the MVAE method with a single forward propagation of the encoder for estimating latent variables. In previous studies, the encoder is trained together with the decoder using clean speech. In contrast, in this study, we re-train only the encoder by using the mixed signals with the decoder fixed. More specifically, using the imperfectly separated signals obtained in the process of the source separation algorithm, we train the encoder to find the optimal latent variables that minimize the objective function for source separation. Experimental results show that the proposed method reduces the objective function at almost every iteration and achieves higher separation performance than the conventional method.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127327039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-07DOI: 10.23919/APSIPAASC55919.2022.9979997
I-Ting Hsieh, Chung-Hsien Wu, Zhenqiang Zhao
Automatic speech recognition (ASR) for under-resourced languages has been a challenging task during the past decade. In this paper, regarding Taiwanese as the under resourced language, the speech data of the high-resourced languages which have most phonemes in common with Taiwanese are selected as the supplementary resources for meta-training the acoustic models for Taiwanese ASR. Mandarin, English, Japanese, Cantonese and Thai as the high-resourced languages are selected as the supplementary languages based on the designed selection criteria. Model-agnostic meta-learning (MAML) is then used as the meta-training strategy. For evaluation, when 4000 utterances were selected from each supplementary language, we obtained the WER of 20.89% and the SER of 8.86% for Taiwanese ASR. The results were better than the baseline model (26.18% and 13.99%) using only the Taiwanese corpus and traditional method.
{"title":"Selection of Supplementary Acoustic Data for Meta-Learning in Under-Resourced Speech Recognition","authors":"I-Ting Hsieh, Chung-Hsien Wu, Zhenqiang Zhao","doi":"10.23919/APSIPAASC55919.2022.9979997","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979997","url":null,"abstract":"Automatic speech recognition (ASR) for under-resourced languages has been a challenging task during the past decade. In this paper, regarding Taiwanese as the under resourced language, the speech data of the high-resourced languages which have most phonemes in common with Taiwanese are selected as the supplementary resources for meta-training the acoustic models for Taiwanese ASR. Mandarin, English, Japanese, Cantonese and Thai as the high-resourced languages are selected as the supplementary languages based on the designed selection criteria. Model-agnostic meta-learning (MAML) is then used as the meta-training strategy. For evaluation, when 4000 utterances were selected from each supplementary language, we obtained the WER of 20.89% and the SER of 8.86% for Taiwanese ASR. The results were better than the baseline model (26.18% and 13.99%) using only the Taiwanese corpus and traditional method.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123278334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-07DOI: 10.23919/APSIPAASC55919.2022.9980066
Meng Haoyu, Qu Xiaodong, Zhang Xingyu, L. Wolin, Zhang Zhengyan, Yang Xiaopeng
Adaptive beamforming is widely used in phased array radar for interference and noise suppression. However, when mainlobe interference exists, mainlobe distortion, peak offset and sidelobe level rise will occur, which seriously deteriorate the performance of adaptive beamforming. To address this issue, this paper proposes a mainlobe interference suppression method based on blocking matrix preprocessing (BMP) with low sidelobe constraint. In the method, singular value decomposition (SVD) method is firstly utilized to estimate the angle of the mainlobe interference and blocking matrix is constituted to suppress the mainlobe interference. Then, under the restriction of low sidelobe level, a convex optimization problem is solved to further suppress the sidelobe interferences. Numerical simulations are conducted, and the results show the effectiveness and robustness of the method.
{"title":"Mainlobe Interference Suppression Method Based on Blocking Matrix Preprocessing with Low Sidelobe Constraint","authors":"Meng Haoyu, Qu Xiaodong, Zhang Xingyu, L. Wolin, Zhang Zhengyan, Yang Xiaopeng","doi":"10.23919/APSIPAASC55919.2022.9980066","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980066","url":null,"abstract":"Adaptive beamforming is widely used in phased array radar for interference and noise suppression. However, when mainlobe interference exists, mainlobe distortion, peak offset and sidelobe level rise will occur, which seriously deteriorate the performance of adaptive beamforming. To address this issue, this paper proposes a mainlobe interference suppression method based on blocking matrix preprocessing (BMP) with low sidelobe constraint. In the method, singular value decomposition (SVD) method is firstly utilized to estimate the angle of the mainlobe interference and blocking matrix is constituted to suppress the mainlobe interference. Then, under the restriction of low sidelobe level, a convex optimization problem is solved to further suppress the sidelobe interferences. Numerical simulations are conducted, and the results show the effectiveness and robustness of the method.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123688341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-07DOI: 10.23919/APSIPAASC55919.2022.9980113
Ruei-Ting Chien, Mao-Jan Lin, Yang-Ming Yeh, Yi-Chang Lu
In many hardware aligners, on-chip traceback is not supported because it requires large memory usage. The issue becomes even worse for three-sequence alignment, which is an algorithm to improve the accuracy of multiple sequence alignment. In this paper, we propose a design to reduce the usage of traceback memory for three-sequence alignment with affine gap penalty models. Using the pre-computed results from the forward dynamic programming stage, we are able to encode traceback directions with fewer bits. Our algorithm could save 37.5% memory usage when compared to direct implementations. The proposed bit-reduction method can be further combined with existing region-reduction traceback methods to lower required memory sizes.
{"title":"Traceback Memory Reduction for Three-Sequence Alignment Algorithm with Affine Gap Models","authors":"Ruei-Ting Chien, Mao-Jan Lin, Yang-Ming Yeh, Yi-Chang Lu","doi":"10.23919/APSIPAASC55919.2022.9980113","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980113","url":null,"abstract":"In many hardware aligners, on-chip traceback is not supported because it requires large memory usage. The issue becomes even worse for three-sequence alignment, which is an algorithm to improve the accuracy of multiple sequence alignment. In this paper, we propose a design to reduce the usage of traceback memory for three-sequence alignment with affine gap penalty models. Using the pre-computed results from the forward dynamic programming stage, we are able to encode traceback directions with fewer bits. Our algorithm could save 37.5% memory usage when compared to direct implementations. The proposed bit-reduction method can be further combined with existing region-reduction traceback methods to lower required memory sizes.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123739037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-07DOI: 10.23919/APSIPAASC55919.2022.9980294
Carter Lyons, R. Raj, M. Cheney
In the age of accessible computing, machine intelligence (MI) has become a widely applicable and successful tool in image recognition. With this success, MI has, more recently, been applied to compressive sensing and tomographic imaging. One particular application of MI to image estimation, known as algorithm unrolling, is the implementation of an iterative imaging algorithm as a deep neural network (DNN). Algorithm unrolling has shown improvements in image reconstruction over both iterative imaging algorithms and standard neural networks. Here, we present a least squares iterative image estimation algorithm under the assumption of a Compound Gaussian (CG) prior for the image. The CG prior asserts that the image wavelet coefficients are a nonlinear function of two Gaussians. The developed iterative imaging algorithm is then unrolled into a DNN named CG-Net. After training, CG-Net is shown to be successful in the estimation of image wavelet coefficients from Radon transform measurements.
{"title":"CG-Net: A Compound Gaussian Prior Based Unrolled Imaging Network","authors":"Carter Lyons, R. Raj, M. Cheney","doi":"10.23919/APSIPAASC55919.2022.9980294","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980294","url":null,"abstract":"In the age of accessible computing, machine intelligence (MI) has become a widely applicable and successful tool in image recognition. With this success, MI has, more recently, been applied to compressive sensing and tomographic imaging. One particular application of MI to image estimation, known as algorithm unrolling, is the implementation of an iterative imaging algorithm as a deep neural network (DNN). Algorithm unrolling has shown improvements in image reconstruction over both iterative imaging algorithms and standard neural networks. Here, we present a least squares iterative image estimation algorithm under the assumption of a Compound Gaussian (CG) prior for the image. The CG prior asserts that the image wavelet coefficients are a nonlinear function of two Gaussians. The developed iterative imaging algorithm is then unrolled into a DNN named CG-Net. After training, CG-Net is shown to be successful in the estimation of image wavelet coefficients from Radon transform measurements.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125308998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-07DOI: 10.23919/APSIPAASC55919.2022.9979887
Jin-woong Ko, Nyeong-Ho Shin, Seon-Ho Lee, Chang-Su Kim
We propose an angle adjustment algorithm for the composition enhancement of digital photographs. The proposed algorithm jointly learns the scene type, composition, and semantic line information of an image to improve the accuracy of angle adjustment. To this end, we design a unified angle adjustment network (UAAN), which consists of a unified encoder and four task-specific refinement modules and estimators. First, we generate shared features using the unified encoder. Then, we refine those features using the refinement modules to perform the four tasks of angle regression, scene type classification, composition classification, and semantic line detection. Experimental results demonstrate the effectiveness of the proposed UAAN algorithm.
{"title":"Unified Angle Adjustment Network for Image Composition Enhancement","authors":"Jin-woong Ko, Nyeong-Ho Shin, Seon-Ho Lee, Chang-Su Kim","doi":"10.23919/APSIPAASC55919.2022.9979887","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979887","url":null,"abstract":"We propose an angle adjustment algorithm for the composition enhancement of digital photographs. The proposed algorithm jointly learns the scene type, composition, and semantic line information of an image to improve the accuracy of angle adjustment. To this end, we design a unified angle adjustment network (UAAN), which consists of a unified encoder and four task-specific refinement modules and estimators. First, we generate shared features using the unified encoder. Then, we refine those features using the refinement modules to perform the four tasks of angle regression, scene type classification, composition classification, and semantic line detection. Experimental results demonstrate the effectiveness of the proposed UAAN algorithm.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115329015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-07DOI: 10.23919/APSIPAASC55919.2022.9979878
Duc-Tuan Truong, Tran The Anh, Chng Eng Siong
Self-supervised learning (SSL) has played an important role in various tasks in the field of speech and audio processing. However, there is limited research on adapting these SSL models to predict the speaker's age and gender using speech signals. In this paper, we investigate seven SSL models, namely PASE+, NPC, wav2vec 2.0, XLSR, HuBERT, WavLM, and data2vec in the joint age estimation and gender classification task on the TIMIT corpus. Additionally, we also study the effect of using different hidden encoder layers within these models on the age estimation result. Furthermore, we evaluate how the performance of different SSL models varies in predicting the speaker's age under simulated noisy conditions. The simulated noisy speech is created by mixing the clean utterance from the TIMIT test set with random noises from the Music and Noise category of the MUSAN corpus on multiple levels of signal-to-noise ratio (SNR). Our findings confirm that a recent SSL model, namely WavLM can obtain better and more robust speech representation than wav2vec 2.0 SSL model used in the current state-of-the-art (SOTA) approach by achieving a 3.6% and 11.32% mean average error (MAE) reduction on the clean and 5dB SNR TIMIT test set.
自监督学习(SSL)在语音和音频处理领域的各种任务中发挥着重要作用。然而,利用这些SSL模型利用语音信号预测说话人的年龄和性别的研究有限。本文研究了PASE+、NPC、wav2vec 2.0、XLSR、HuBERT、WavLM和data2vec 7种SSL模型在TIMIT语料库上的年龄估计和性别联合分类任务。此外,我们还研究了在这些模型中使用不同的隐藏编码器层对年龄估计结果的影响。此外,我们评估了不同SSL模型在模拟噪声条件下预测说话人年龄的性能变化。通过将TIMIT测试集的干净语音与MUSAN语料库中Music and Noise类别的随机噪声在多个信噪比(SNR)水平上混合,生成模拟噪声语音。我们的研究结果证实,最近的SSL模型,即WavLM,可以获得比当前最先进(SOTA)方法中使用的wav2vec 2.0 SSL模型更好、更稳健的语音表示,在干净和5dB信噪比TIMIT测试集上实现3.6%和11.32%的平均误差(MAE)降低。
{"title":"Exploring Speaker Age Estimation on Different Self-Supervised Learning Models","authors":"Duc-Tuan Truong, Tran The Anh, Chng Eng Siong","doi":"10.23919/APSIPAASC55919.2022.9979878","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979878","url":null,"abstract":"Self-supervised learning (SSL) has played an important role in various tasks in the field of speech and audio processing. However, there is limited research on adapting these SSL models to predict the speaker's age and gender using speech signals. In this paper, we investigate seven SSL models, namely PASE+, NPC, wav2vec 2.0, XLSR, HuBERT, WavLM, and data2vec in the joint age estimation and gender classification task on the TIMIT corpus. Additionally, we also study the effect of using different hidden encoder layers within these models on the age estimation result. Furthermore, we evaluate how the performance of different SSL models varies in predicting the speaker's age under simulated noisy conditions. The simulated noisy speech is created by mixing the clean utterance from the TIMIT test set with random noises from the Music and Noise category of the MUSAN corpus on multiple levels of signal-to-noise ratio (SNR). Our findings confirm that a recent SSL model, namely WavLM can obtain better and more robust speech representation than wav2vec 2.0 SSL model used in the current state-of-the-art (SOTA) approach by achieving a 3.6% and 11.32% mean average error (MAE) reduction on the clean and 5dB SNR TIMIT test set.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116058626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-07DOI: 10.23919/APSIPAASC55919.2022.9979942
Matsumoto Riku, Kimura Masaomi
While many methods of summarization have been proposed, there have been few methods to generate a title, especially for journal articles. However, the differences between summarization and creating a title are length and clause form. We propose a title generation model for a journal article based on Transformer, which refers to a wide range of the article. We propose to narrow down the abstract sentences to only important sentences before title generation so that the author's claim can be easily reflected in the title. We applied our method to journal articles published on arXiv.org and found that our model generated a title including words in the original title.
{"title":"A title generation method with Transformer for journal articles","authors":"Matsumoto Riku, Kimura Masaomi","doi":"10.23919/APSIPAASC55919.2022.9979942","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979942","url":null,"abstract":"While many methods of summarization have been proposed, there have been few methods to generate a title, especially for journal articles. However, the differences between summarization and creating a title are length and clause form. We propose a title generation model for a journal article based on Transformer, which refers to a wide range of the article. We propose to narrow down the abstract sentences to only important sentences before title generation so that the author's claim can be easily reflected in the title. We applied our method to journal articles published on arXiv.org and found that our model generated a title including words in the original title.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"262 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116451435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-07DOI: 10.23919/APSIPAASC55919.2022.9980301
Kana Goto, Tetsuya Ueda, Li Li, Takeshi Yamada, S. Makino
In this paper, we derive an alternative online algorithm for geometrically constrained independent vector analysis (GC-IVA) based on iterative source steering (ISS) to tackle real-time directional speech enhancement. The proposed algorithm fully exploits the advantages of the auxiliary function approach, i.e., fast convergence and no stepsize tuning, and ISS, i.e., low computational complexity and numerical stability, making it highly suitable for practical use. In addition, we investigate the performance impact of using estimated spatial information, which is assumed to be known as prior information in GC-IVA. Specifically, we evaluate the proposed algorithm with geometric constraints defined using directions of arrival (DOAs) estimated by the multiple signal classification (MUSIC) method. Experimental results revealed that the proposed online algorithm could work in real-time and achieve comparable speech enhancement performance with the conventional method called online GC-AuxIVA-VCD while significantly reducing execution times in the situation where a fixed target was interfered with by a moving interference.
{"title":"Accelerating online algorithm using geometrically constrained independent vector analysis with iterative source steering","authors":"Kana Goto, Tetsuya Ueda, Li Li, Takeshi Yamada, S. Makino","doi":"10.23919/APSIPAASC55919.2022.9980301","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980301","url":null,"abstract":"In this paper, we derive an alternative online algorithm for geometrically constrained independent vector analysis (GC-IVA) based on iterative source steering (ISS) to tackle real-time directional speech enhancement. The proposed algorithm fully exploits the advantages of the auxiliary function approach, i.e., fast convergence and no stepsize tuning, and ISS, i.e., low computational complexity and numerical stability, making it highly suitable for practical use. In addition, we investigate the performance impact of using estimated spatial information, which is assumed to be known as prior information in GC-IVA. Specifically, we evaluate the proposed algorithm with geometric constraints defined using directions of arrival (DOAs) estimated by the multiple signal classification (MUSIC) method. Experimental results revealed that the proposed online algorithm could work in real-time and achieve comparable speech enhancement performance with the conventional method called online GC-AuxIVA-VCD while significantly reducing execution times in the situation where a fixed target was interfered with by a moving interference.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122744738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}