首页 > 最新文献

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

英文 中文
Non-Autoregressive Speech Recognition with Error Correction Module 带纠错模块的非自回归语音识别
Yukun Qian, Xuyi Zhuang, Zehua Zhang, Lianyu Zhou, Xu Lin, Mingjiang Wang
Autoregressive models have achieved good performance in the field of speech recognition. However, the autore-gressive model uses recursive decoding and beam search in the inference stage, which leads to its slow inference speed. On the other hand, the non-autoregressive model naturally cannot utilize the context since all tokens are output at one time. To solve this problem, we propose a position-dependent non-autoregressive model. And in order to make better use of contextual information, we propose a pre-trained language model for speech recognition, which is placed behind the non-autoregressive model as an error correction module. In this way, we exchanged a smaller amount of calculation for the improvement of the recognition rate. Our method not only greatly reduces the computational cost, but also maintains a good recognition rate. We tested our model on the public Chinese speech corpus AISHELL-1. Our model achieves a 6.5% character error rate while the real-time factor is only 0.0022, which is 1/17 of the autoregressive model.
自回归模型在语音识别领域取得了良好的效果。然而,自回归模型在推理阶段使用递归解码和波束搜索,导致其推理速度较慢。另一方面,非自回归模型自然不能利用上下文,因为所有标记都是一次输出的。为了解决这个问题,我们提出了一个位置相关的非自回归模型。为了更好地利用上下文信息,我们提出了一种用于语音识别的预训练语言模型,该模型作为纠错模块放置在非自回归模型后面。这样,我们以更少的计算量换取了识别率的提高。该方法不仅大大降低了计算成本,而且保持了良好的识别率。我们在公共汉语语音语料库AISHELL-1上测试了我们的模型。我们的模型实现了6.5%的字符错误率,而实时因子仅为0.0022,是自回归模型的1/17。
{"title":"Non-Autoregressive Speech Recognition with Error Correction Module","authors":"Yukun Qian, Xuyi Zhuang, Zehua Zhang, Lianyu Zhou, Xu Lin, Mingjiang Wang","doi":"10.23919/APSIPAASC55919.2022.9980031","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980031","url":null,"abstract":"Autoregressive models have achieved good performance in the field of speech recognition. However, the autore-gressive model uses recursive decoding and beam search in the inference stage, which leads to its slow inference speed. On the other hand, the non-autoregressive model naturally cannot utilize the context since all tokens are output at one time. To solve this problem, we propose a position-dependent non-autoregressive model. And in order to make better use of contextual information, we propose a pre-trained language model for speech recognition, which is placed behind the non-autoregressive model as an error correction module. In this way, we exchanged a smaller amount of calculation for the improvement of the recognition rate. Our method not only greatly reduces the computational cost, but also maintains a good recognition rate. We tested our model on the public Chinese speech corpus AISHELL-1. Our model achieves a 6.5% character error rate while the real-time factor is only 0.0022, which is 1/17 of the autoregressive model.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"29 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120858836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Encoder Re-training with Mixture Signals on FastMVAE Method 基于FastMVAE方法的混合信号编码器再训练
Shuhei Yamaji, Taishi Nakashima, Nobutaka Ono, Li Li, H. Kameoka
In this paper, we propose a new network training to improve the source separation performance of the fast multichannel variational autoencoder (FastMVAE) method. The FastMVAE method is very effective for supervised source separation. It also significantly reduces the processing time by replacing the backpropagation steps in the MVAE method with a single forward propagation of the encoder for estimating latent variables. In previous studies, the encoder is trained together with the decoder using clean speech. In contrast, in this study, we re-train only the encoder by using the mixed signals with the decoder fixed. More specifically, using the imperfectly separated signals obtained in the process of the source separation algorithm, we train the encoder to find the optimal latent variables that minimize the objective function for source separation. Experimental results show that the proposed method reduces the objective function at almost every iteration and achieves higher separation performance than the conventional method.
为了提高快速多通道变分自编码器(FastMVAE)方法的信源分离性能,提出了一种新的网络训练方法。FastMVAE方法对于监督源分离是非常有效的。它还通过将MVAE方法中的反向传播步骤替换为编码器的单个前向传播步骤来估计潜在变量,从而显着减少了处理时间。在以往的研究中,编码器和解码器是一起训练的,使用干净的语音。相比之下,在本研究中,我们使用固定解码器的混合信号只重新训练编码器。更具体地说,我们利用源分离算法过程中得到的不完全分离信号,训练编码器找到使源分离目标函数最小的最优潜变量。实验结果表明,该方法几乎在每次迭代中都能降低目标函数,并取得了比传统方法更高的分离性能。
{"title":"Encoder Re-training with Mixture Signals on FastMVAE Method","authors":"Shuhei Yamaji, Taishi Nakashima, Nobutaka Ono, Li Li, H. Kameoka","doi":"10.23919/APSIPAASC55919.2022.9979843","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979843","url":null,"abstract":"In this paper, we propose a new network training to improve the source separation performance of the fast multichannel variational autoencoder (FastMVAE) method. The FastMVAE method is very effective for supervised source separation. It also significantly reduces the processing time by replacing the backpropagation steps in the MVAE method with a single forward propagation of the encoder for estimating latent variables. In previous studies, the encoder is trained together with the decoder using clean speech. In contrast, in this study, we re-train only the encoder by using the mixed signals with the decoder fixed. More specifically, using the imperfectly separated signals obtained in the process of the source separation algorithm, we train the encoder to find the optimal latent variables that minimize the objective function for source separation. Experimental results show that the proposed method reduces the objective function at almost every iteration and achieves higher separation performance than the conventional method.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127327039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Selection of Supplementary Acoustic Data for Meta-Learning in Under-Resourced Speech Recognition 资源不足语音识别中元学习辅助声学数据的选择
I-Ting Hsieh, Chung-Hsien Wu, Zhenqiang Zhao
Automatic speech recognition (ASR) for under-resourced languages has been a challenging task during the past decade. In this paper, regarding Taiwanese as the under resourced language, the speech data of the high-resourced languages which have most phonemes in common with Taiwanese are selected as the supplementary resources for meta-training the acoustic models for Taiwanese ASR. Mandarin, English, Japanese, Cantonese and Thai as the high-resourced languages are selected as the supplementary languages based on the designed selection criteria. Model-agnostic meta-learning (MAML) is then used as the meta-training strategy. For evaluation, when 4000 utterances were selected from each supplementary language, we obtained the WER of 20.89% and the SER of 8.86% for Taiwanese ASR. The results were better than the baseline model (26.18% and 13.99%) using only the Taiwanese corpus and traditional method.
在过去的十年中,资源匮乏语言的自动语音识别(ASR)一直是一项具有挑战性的任务。本文以台语为资源不足语言,选取与台语音素相近的高资源语言语音数据作为辅助资源,对台语ASR声学模型进行元训练。普通话、英语、日语、粤语和泰语作为资源丰富的语言,根据设计的选择标准作为补充语言。然后使用模型不可知元学习(MAML)作为元训练策略。为了评估,我们从每种补充语言中选择4000个话语,我们得到台湾ASR的WER为20.89%,SER为8.86%。结果优于仅使用台湾语料库和传统方法的基线模型(26.18%和13.99%)。
{"title":"Selection of Supplementary Acoustic Data for Meta-Learning in Under-Resourced Speech Recognition","authors":"I-Ting Hsieh, Chung-Hsien Wu, Zhenqiang Zhao","doi":"10.23919/APSIPAASC55919.2022.9979997","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979997","url":null,"abstract":"Automatic speech recognition (ASR) for under-resourced languages has been a challenging task during the past decade. In this paper, regarding Taiwanese as the under resourced language, the speech data of the high-resourced languages which have most phonemes in common with Taiwanese are selected as the supplementary resources for meta-training the acoustic models for Taiwanese ASR. Mandarin, English, Japanese, Cantonese and Thai as the high-resourced languages are selected as the supplementary languages based on the designed selection criteria. Model-agnostic meta-learning (MAML) is then used as the meta-training strategy. For evaluation, when 4000 utterances were selected from each supplementary language, we obtained the WER of 20.89% and the SER of 8.86% for Taiwanese ASR. The results were better than the baseline model (26.18% and 13.99%) using only the Taiwanese corpus and traditional method.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123278334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Mainlobe Interference Suppression Method Based on Blocking Matrix Preprocessing with Low Sidelobe Constraint 基于低旁瓣约束的块矩阵预处理的主瓣干扰抑制方法
Meng Haoyu, Qu Xiaodong, Zhang Xingyu, L. Wolin, Zhang Zhengyan, Yang Xiaopeng
Adaptive beamforming is widely used in phased array radar for interference and noise suppression. However, when mainlobe interference exists, mainlobe distortion, peak offset and sidelobe level rise will occur, which seriously deteriorate the performance of adaptive beamforming. To address this issue, this paper proposes a mainlobe interference suppression method based on blocking matrix preprocessing (BMP) with low sidelobe constraint. In the method, singular value decomposition (SVD) method is firstly utilized to estimate the angle of the mainlobe interference and blocking matrix is constituted to suppress the mainlobe interference. Then, under the restriction of low sidelobe level, a convex optimization problem is solved to further suppress the sidelobe interferences. Numerical simulations are conducted, and the results show the effectiveness and robustness of the method.
自适应波束形成在相控阵雷达中广泛应用于干扰和噪声抑制。但是,当存在主瓣干扰时,会产生主瓣失真、峰值偏移和副瓣电平上升,严重影响自适应波束形成的性能。针对这一问题,本文提出了一种基于低旁瓣约束的块矩阵预处理(BMP)的主瓣干扰抑制方法。该方法首先利用奇异值分解(SVD)方法估计主叶干扰角度,并构造阻塞矩阵抑制主叶干扰。然后,在低副瓣电平限制下,解决凸优化问题,进一步抑制副瓣干扰。数值仿真结果表明了该方法的有效性和鲁棒性。
{"title":"Mainlobe Interference Suppression Method Based on Blocking Matrix Preprocessing with Low Sidelobe Constraint","authors":"Meng Haoyu, Qu Xiaodong, Zhang Xingyu, L. Wolin, Zhang Zhengyan, Yang Xiaopeng","doi":"10.23919/APSIPAASC55919.2022.9980066","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980066","url":null,"abstract":"Adaptive beamforming is widely used in phased array radar for interference and noise suppression. However, when mainlobe interference exists, mainlobe distortion, peak offset and sidelobe level rise will occur, which seriously deteriorate the performance of adaptive beamforming. To address this issue, this paper proposes a mainlobe interference suppression method based on blocking matrix preprocessing (BMP) with low sidelobe constraint. In the method, singular value decomposition (SVD) method is firstly utilized to estimate the angle of the mainlobe interference and blocking matrix is constituted to suppress the mainlobe interference. Then, under the restriction of low sidelobe level, a convex optimization problem is solved to further suppress the sidelobe interferences. Numerical simulations are conducted, and the results show the effectiveness and robustness of the method.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123688341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Traceback Memory Reduction for Three-Sequence Alignment Algorithm with Affine Gap Models 基于仿射间隙模型的三序列对齐算法的回溯内存缩减
Ruei-Ting Chien, Mao-Jan Lin, Yang-Ming Yeh, Yi-Chang Lu
In many hardware aligners, on-chip traceback is not supported because it requires large memory usage. The issue becomes even worse for three-sequence alignment, which is an algorithm to improve the accuracy of multiple sequence alignment. In this paper, we propose a design to reduce the usage of traceback memory for three-sequence alignment with affine gap penalty models. Using the pre-computed results from the forward dynamic programming stage, we are able to encode traceback directions with fewer bits. Our algorithm could save 37.5% memory usage when compared to direct implementations. The proposed bit-reduction method can be further combined with existing region-reduction traceback methods to lower required memory sizes.
在许多硬件对齐器中,不支持片上回溯,因为它需要大量内存使用。对于提高多序列比对精度的三序列比对算法,这个问题更加严重。在本文中,我们提出了一种利用仿射间隙惩罚模型减少三序列比对回溯内存使用的设计。利用前向动态规划阶段的预计算结果,我们能够用更少的比特对回溯方向进行编码。与直接实现相比,我们的算法可以节省37.5%的内存使用。所提出的比特缩减方法可以进一步与现有的区域缩减回溯方法相结合,以降低所需的内存大小。
{"title":"Traceback Memory Reduction for Three-Sequence Alignment Algorithm with Affine Gap Models","authors":"Ruei-Ting Chien, Mao-Jan Lin, Yang-Ming Yeh, Yi-Chang Lu","doi":"10.23919/APSIPAASC55919.2022.9980113","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980113","url":null,"abstract":"In many hardware aligners, on-chip traceback is not supported because it requires large memory usage. The issue becomes even worse for three-sequence alignment, which is an algorithm to improve the accuracy of multiple sequence alignment. In this paper, we propose a design to reduce the usage of traceback memory for three-sequence alignment with affine gap penalty models. Using the pre-computed results from the forward dynamic programming stage, we are able to encode traceback directions with fewer bits. Our algorithm could save 37.5% memory usage when compared to direct implementations. The proposed bit-reduction method can be further combined with existing region-reduction traceback methods to lower required memory sizes.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123739037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CG-Net: A Compound Gaussian Prior Based Unrolled Imaging Network CG-Net:一种基于复合高斯先验的展开成像网络
Carter Lyons, R. Raj, M. Cheney
In the age of accessible computing, machine intelligence (MI) has become a widely applicable and successful tool in image recognition. With this success, MI has, more recently, been applied to compressive sensing and tomographic imaging. One particular application of MI to image estimation, known as algorithm unrolling, is the implementation of an iterative imaging algorithm as a deep neural network (DNN). Algorithm unrolling has shown improvements in image reconstruction over both iterative imaging algorithms and standard neural networks. Here, we present a least squares iterative image estimation algorithm under the assumption of a Compound Gaussian (CG) prior for the image. The CG prior asserts that the image wavelet coefficients are a nonlinear function of two Gaussians. The developed iterative imaging algorithm is then unrolled into a DNN named CG-Net. After training, CG-Net is shown to be successful in the estimation of image wavelet coefficients from Radon transform measurements.
在无障碍计算时代,机器智能(MI)已经成为一种广泛应用和成功的图像识别工具。有了这一成功,最近MI已被应用于压缩感知和层析成像。MI在图像估计中的一个特殊应用,称为算法展开,是作为深度神经网络(DNN)的迭代成像算法的实现。与迭代成像算法和标准神经网络相比,算法展开显示出图像重建方面的改进。本文提出了一种基于复合高斯先验假设的最小二乘迭代图像估计算法。CG先验断言图像小波系数是两个高斯函数的非线性函数。然后将开发的迭代成像算法展开为一个名为CG-Net的深度神经网络。经过训练,CG-Net在Radon变换图像的小波系数估计上取得了成功。
{"title":"CG-Net: A Compound Gaussian Prior Based Unrolled Imaging Network","authors":"Carter Lyons, R. Raj, M. Cheney","doi":"10.23919/APSIPAASC55919.2022.9980294","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980294","url":null,"abstract":"In the age of accessible computing, machine intelligence (MI) has become a widely applicable and successful tool in image recognition. With this success, MI has, more recently, been applied to compressive sensing and tomographic imaging. One particular application of MI to image estimation, known as algorithm unrolling, is the implementation of an iterative imaging algorithm as a deep neural network (DNN). Algorithm unrolling has shown improvements in image reconstruction over both iterative imaging algorithms and standard neural networks. Here, we present a least squares iterative image estimation algorithm under the assumption of a Compound Gaussian (CG) prior for the image. The CG prior asserts that the image wavelet coefficients are a nonlinear function of two Gaussians. The developed iterative imaging algorithm is then unrolled into a DNN named CG-Net. After training, CG-Net is shown to be successful in the estimation of image wavelet coefficients from Radon transform measurements.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125308998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Unified Angle Adjustment Network for Image Composition Enhancement 用于图像合成增强的统一角度调整网络
Jin-woong Ko, Nyeong-Ho Shin, Seon-Ho Lee, Chang-Su Kim
We propose an angle adjustment algorithm for the composition enhancement of digital photographs. The proposed algorithm jointly learns the scene type, composition, and semantic line information of an image to improve the accuracy of angle adjustment. To this end, we design a unified angle adjustment network (UAAN), which consists of a unified encoder and four task-specific refinement modules and estimators. First, we generate shared features using the unified encoder. Then, we refine those features using the refinement modules to perform the four tasks of angle regression, scene type classification, composition classification, and semantic line detection. Experimental results demonstrate the effectiveness of the proposed UAAN algorithm.
提出了一种增强数码照片构图的角度调整算法。该算法联合学习图像的场景类型、构成和语义线信息,提高角度调整的精度。为此,我们设计了一个统一的角度调整网络(UAAN),该网络由一个统一的编码器和四个特定任务的细化模块和估计器组成。首先,我们使用统一编码器生成共享特征。然后,我们使用改进模块对这些特征进行细化,完成角度回归、场景类型分类、构图分类和语义线检测四个任务。实验结果证明了该算法的有效性。
{"title":"Unified Angle Adjustment Network for Image Composition Enhancement","authors":"Jin-woong Ko, Nyeong-Ho Shin, Seon-Ho Lee, Chang-Su Kim","doi":"10.23919/APSIPAASC55919.2022.9979887","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979887","url":null,"abstract":"We propose an angle adjustment algorithm for the composition enhancement of digital photographs. The proposed algorithm jointly learns the scene type, composition, and semantic line information of an image to improve the accuracy of angle adjustment. To this end, we design a unified angle adjustment network (UAAN), which consists of a unified encoder and four task-specific refinement modules and estimators. First, we generate shared features using the unified encoder. Then, we refine those features using the refinement modules to perform the four tasks of angle regression, scene type classification, composition classification, and semantic line detection. Experimental results demonstrate the effectiveness of the proposed UAAN algorithm.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115329015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Speaker Age Estimation on Different Self-Supervised Learning Models 不同自监督学习模型对说话人年龄估计的探讨
Duc-Tuan Truong, Tran The Anh, Chng Eng Siong
Self-supervised learning (SSL) has played an important role in various tasks in the field of speech and audio processing. However, there is limited research on adapting these SSL models to predict the speaker's age and gender using speech signals. In this paper, we investigate seven SSL models, namely PASE+, NPC, wav2vec 2.0, XLSR, HuBERT, WavLM, and data2vec in the joint age estimation and gender classification task on the TIMIT corpus. Additionally, we also study the effect of using different hidden encoder layers within these models on the age estimation result. Furthermore, we evaluate how the performance of different SSL models varies in predicting the speaker's age under simulated noisy conditions. The simulated noisy speech is created by mixing the clean utterance from the TIMIT test set with random noises from the Music and Noise category of the MUSAN corpus on multiple levels of signal-to-noise ratio (SNR). Our findings confirm that a recent SSL model, namely WavLM can obtain better and more robust speech representation than wav2vec 2.0 SSL model used in the current state-of-the-art (SOTA) approach by achieving a 3.6% and 11.32% mean average error (MAE) reduction on the clean and 5dB SNR TIMIT test set.
自监督学习(SSL)在语音和音频处理领域的各种任务中发挥着重要作用。然而,利用这些SSL模型利用语音信号预测说话人的年龄和性别的研究有限。本文研究了PASE+、NPC、wav2vec 2.0、XLSR、HuBERT、WavLM和data2vec 7种SSL模型在TIMIT语料库上的年龄估计和性别联合分类任务。此外,我们还研究了在这些模型中使用不同的隐藏编码器层对年龄估计结果的影响。此外,我们评估了不同SSL模型在模拟噪声条件下预测说话人年龄的性能变化。通过将TIMIT测试集的干净语音与MUSAN语料库中Music and Noise类别的随机噪声在多个信噪比(SNR)水平上混合,生成模拟噪声语音。我们的研究结果证实,最近的SSL模型,即WavLM,可以获得比当前最先进(SOTA)方法中使用的wav2vec 2.0 SSL模型更好、更稳健的语音表示,在干净和5dB信噪比TIMIT测试集上实现3.6%和11.32%的平均误差(MAE)降低。
{"title":"Exploring Speaker Age Estimation on Different Self-Supervised Learning Models","authors":"Duc-Tuan Truong, Tran The Anh, Chng Eng Siong","doi":"10.23919/APSIPAASC55919.2022.9979878","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979878","url":null,"abstract":"Self-supervised learning (SSL) has played an important role in various tasks in the field of speech and audio processing. However, there is limited research on adapting these SSL models to predict the speaker's age and gender using speech signals. In this paper, we investigate seven SSL models, namely PASE+, NPC, wav2vec 2.0, XLSR, HuBERT, WavLM, and data2vec in the joint age estimation and gender classification task on the TIMIT corpus. Additionally, we also study the effect of using different hidden encoder layers within these models on the age estimation result. Furthermore, we evaluate how the performance of different SSL models varies in predicting the speaker's age under simulated noisy conditions. The simulated noisy speech is created by mixing the clean utterance from the TIMIT test set with random noises from the Music and Noise category of the MUSAN corpus on multiple levels of signal-to-noise ratio (SNR). Our findings confirm that a recent SSL model, namely WavLM can obtain better and more robust speech representation than wav2vec 2.0 SSL model used in the current state-of-the-art (SOTA) approach by achieving a 3.6% and 11.32% mean average error (MAE) reduction on the clean and 5dB SNR TIMIT test set.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116058626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A title generation method with Transformer for journal articles 一个用Transformer生成期刊文章标题的方法
Matsumoto Riku, Kimura Masaomi
While many methods of summarization have been proposed, there have been few methods to generate a title, especially for journal articles. However, the differences between summarization and creating a title are length and clause form. We propose a title generation model for a journal article based on Transformer, which refers to a wide range of the article. We propose to narrow down the abstract sentences to only important sentences before title generation so that the author's claim can be easily reflected in the title. We applied our method to journal articles published on arXiv.org and found that our model generated a title including words in the original title.
虽然已经提出了许多摘要方法,但很少有方法来生成标题,特别是对于期刊文章。然而,摘要和创建标题之间的区别在于长度和子句形式。我们提出了一种基于Transformer的期刊文章标题生成模型,该模型涉及广泛的期刊文章。我们建议在标题生成之前,将抽象句缩小到只保留重要的句子,这样作者的主张就可以很容易地反映在标题中。我们将我们的方法应用于发表在arXiv.org上的期刊文章,发现我们的模型生成了一个包含原始标题中的单词的标题。
{"title":"A title generation method with Transformer for journal articles","authors":"Matsumoto Riku, Kimura Masaomi","doi":"10.23919/APSIPAASC55919.2022.9979942","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979942","url":null,"abstract":"While many methods of summarization have been proposed, there have been few methods to generate a title, especially for journal articles. However, the differences between summarization and creating a title are length and clause form. We propose a title generation model for a journal article based on Transformer, which refers to a wide range of the article. We propose to narrow down the abstract sentences to only important sentences before title generation so that the author's claim can be easily reflected in the title. We applied our method to journal articles published on arXiv.org and found that our model generated a title including words in the original title.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"262 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116451435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerating online algorithm using geometrically constrained independent vector analysis with iterative source steering 基于几何约束独立矢量分析的迭代源导向在线加速算法
Kana Goto, Tetsuya Ueda, Li Li, Takeshi Yamada, S. Makino
In this paper, we derive an alternative online algorithm for geometrically constrained independent vector analysis (GC-IVA) based on iterative source steering (ISS) to tackle real-time directional speech enhancement. The proposed algorithm fully exploits the advantages of the auxiliary function approach, i.e., fast convergence and no stepsize tuning, and ISS, i.e., low computational complexity and numerical stability, making it highly suitable for practical use. In addition, we investigate the performance impact of using estimated spatial information, which is assumed to be known as prior information in GC-IVA. Specifically, we evaluate the proposed algorithm with geometric constraints defined using directions of arrival (DOAs) estimated by the multiple signal classification (MUSIC) method. Experimental results revealed that the proposed online algorithm could work in real-time and achieve comparable speech enhancement performance with the conventional method called online GC-AuxIVA-VCD while significantly reducing execution times in the situation where a fixed target was interfered with by a moving interference.
在本文中,我们推导了一种基于迭代源转向(ISS)的几何约束独立矢量分析(GC-IVA)的替代在线算法,以解决实时定向语音增强问题。该算法充分利用了辅助函数法收敛快、无需步长调整的优点,以及ISS法计算复杂度低、数值稳定性好的优点,非常适合实际应用。此外,我们研究了使用估计的空间信息对性能的影响,这些信息在GC-IVA中被认为是已知的先验信息。具体来说,我们使用由多信号分类(MUSIC)方法估计的到达方向(DOAs)定义的几何约束来评估所提出的算法。实验结果表明,在固定目标被移动干扰的情况下,所提出的在线算法能够实时工作,并取得与传统在线GC-AuxIVA-VCD方法相当的语音增强性能,同时显著减少了执行时间。
{"title":"Accelerating online algorithm using geometrically constrained independent vector analysis with iterative source steering","authors":"Kana Goto, Tetsuya Ueda, Li Li, Takeshi Yamada, S. Makino","doi":"10.23919/APSIPAASC55919.2022.9980301","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980301","url":null,"abstract":"In this paper, we derive an alternative online algorithm for geometrically constrained independent vector analysis (GC-IVA) based on iterative source steering (ISS) to tackle real-time directional speech enhancement. The proposed algorithm fully exploits the advantages of the auxiliary function approach, i.e., fast convergence and no stepsize tuning, and ISS, i.e., low computational complexity and numerical stability, making it highly suitable for practical use. In addition, we investigate the performance impact of using estimated spatial information, which is assumed to be known as prior information in GC-IVA. Specifically, we evaluate the proposed algorithm with geometric constraints defined using directions of arrival (DOAs) estimated by the multiple signal classification (MUSIC) method. Experimental results revealed that the proposed online algorithm could work in real-time and achieve comparable speech enhancement performance with the conventional method called online GC-AuxIVA-VCD while significantly reducing execution times in the situation where a fixed target was interfered with by a moving interference.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122744738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1