首页 > 最新文献

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Adversarial Generative Distance-Based Classifier for Robust Out-of-Domain Detection 基于对抗生成距离的鲁棒域外检测分类器
Zhiyuan Zeng, Hong Xu, Keqing He, Yuanmeng Yan, Sihong Liu, Zijun Liu, Weiran Xu
Detecting out-of-domain (OOD) intents is critical in a task-oriented dialog system. Existing methods rely heavily on extensive manually labeled OOD samples and lack robustness. In this paper, we propose an efficient adversarial attack mechanism to augment hard OOD samples and design a novel generative distance-based classifier to detect OOD samples instead of a traditional threshold-based discriminator classifier. Experiments on two public benchmark datasets show that our method can consistently outperform the baselines with a statistically significant margin.
在面向任务的对话系统中,检测域外意图是至关重要的。现有的方法严重依赖于大量人工标记的OOD样本,缺乏鲁棒性。在本文中,我们提出了一种有效的对抗攻击机制来增强硬OOD样本,并设计了一种新的基于生成距离的分类器来检测OOD样本,而不是传统的基于阈值的判别器分类器。在两个公共基准数据集上的实验表明,我们的方法可以持续优于基线,并且具有统计学上显著的边际。
{"title":"Adversarial Generative Distance-Based Classifier for Robust Out-of-Domain Detection","authors":"Zhiyuan Zeng, Hong Xu, Keqing He, Yuanmeng Yan, Sihong Liu, Zijun Liu, Weiran Xu","doi":"10.1109/ICASSP39728.2021.9413908","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413908","url":null,"abstract":"Detecting out-of-domain (OOD) intents is critical in a task-oriented dialog system. Existing methods rely heavily on extensive manually labeled OOD samples and lack robustness. In this paper, we propose an efficient adversarial attack mechanism to augment hard OOD samples and design a novel generative distance-based classifier to detect OOD samples instead of a traditional threshold-based discriminator classifier. Experiments on two public benchmark datasets show that our method can consistently outperform the baselines with a statistically significant margin.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115409791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Capturing Temporal Dependencies Through Future Prediction for CNN-Based Audio Classifiers 基于cnn的音频分类器的未来预测捕获时间依赖关系
Hongwei Song, Jiqing Han, Shiwen Deng, Zhihao Du
This paper focuses on the problem of temporal dependency modeling in the CNN-based models for audio classification tasks. To capture audio temporal dependencies using CNNs, we take a different approach from the purely architecture-induced method and explicitly encode temporal dependencies into the CNN-based audio classifiers. More specifically, in addition to the classification objective, we require the CNN model to solve an auxiliary task of predicting the future features, which is formulated by leveraging the Contrastive Predictive Coding (CPC) loss. Furthermore, a novel hierarchical CPC (HCPC) model is proposed for capturing multi-level temporal dependencies at the same time. The proposed model is evaluated on a wide range of non-speech audio signals, including musical and in-the-wild environmental audio signals. We show that the proposed approach improves the backbone CNNs consistently on all tested benchmark datasets and outperforms a DenseNet model trained from scratch.
本文主要研究基于cnn的音频分类任务模型中的时间依赖建模问题。为了使用cnn捕获音频的时间依赖关系,我们采用了一种与纯粹的架构诱导方法不同的方法,将时间依赖关系显式编码到基于cnn的音频分类器中。更具体地说,除了分类目标之外,我们还要求CNN模型解决预测未来特征的辅助任务,该任务通过利用对比预测编码(CPC)损失来制定。在此基础上,提出了一种新的分层CPC (HCPC)模型,用于同时捕获多层次的时间依赖性。该模型在广泛的非语音音频信号上进行了评估,包括音乐和野外环境音频信号。我们表明,所提出的方法在所有测试的基准数据集上一致地改进了骨干cnn,并且优于从头开始训练的DenseNet模型。
{"title":"Capturing Temporal Dependencies Through Future Prediction for CNN-Based Audio Classifiers","authors":"Hongwei Song, Jiqing Han, Shiwen Deng, Zhihao Du","doi":"10.1109/ICASSP39728.2021.9414018","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414018","url":null,"abstract":"This paper focuses on the problem of temporal dependency modeling in the CNN-based models for audio classification tasks. To capture audio temporal dependencies using CNNs, we take a different approach from the purely architecture-induced method and explicitly encode temporal dependencies into the CNN-based audio classifiers. More specifically, in addition to the classification objective, we require the CNN model to solve an auxiliary task of predicting the future features, which is formulated by leveraging the Contrastive Predictive Coding (CPC) loss. Furthermore, a novel hierarchical CPC (HCPC) model is proposed for capturing multi-level temporal dependencies at the same time. The proposed model is evaluated on a wide range of non-speech audio signals, including musical and in-the-wild environmental audio signals. We show that the proposed approach improves the backbone CNNs consistently on all tested benchmark datasets and outperforms a DenseNet model trained from scratch.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"296 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115426716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Technique for OFDM Symbol Slicing OFDM符号切片技术
A. Pérez-Neira, M. Lagunas
This work presents an orthonormal transform that splits the Orthogonal Frequency Division Multiplex (OFDM) symbol into slices with ranked rate and decoding complexity. The advantage over the existing carrier or time segmentation is that the proposed technique does not depend on the frequency channel to produce slices of equal rate. Also, the encoding and the decoding complexity is kept simple.
这项工作提出了一种正交变换,将正交频分复用(OFDM)符号分割成具有分级速率和解码复杂度的片。与现有的载波或时间分割相比,其优点是所提出的技术不依赖于频率通道来产生等速率的切片。此外,编码和解码的复杂性保持简单。
{"title":"A Technique for OFDM Symbol Slicing","authors":"A. Pérez-Neira, M. Lagunas","doi":"10.1109/ICASSP39728.2021.9414504","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414504","url":null,"abstract":"This work presents an orthonormal transform that splits the Orthogonal Frequency Division Multiplex (OFDM) symbol into slices with ranked rate and decoding complexity. The advantage over the existing carrier or time segmentation is that the proposed technique does not depend on the frequency channel to produce slices of equal rate. Also, the encoding and the decoding complexity is kept simple.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115711064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Layered Embedding-Based Scheme to Cope with Intra-Frame Distortion Drift In IPM-Based HEVC Steganography 基于ipm的HEVC隐写中基于分层嵌入的帧内失真漂移处理方案
Xiaoqing Jia, Jie Wang, Yongliang Liu, Xiangui Kang, Yun-Qing Shi
The spatial correlation of the intra-frame prediction units brings great challenges when minimizing embedding distortions using syndrome-trellis coding (STC) in High Efficiency Video Coding (HEVC) steganography. To solve this problem, we propose a layered embedding scheme which embeds information into the intra-prediction modes (IPMs) of 4×4 intra-frame prediction units (PUs) in HEVC. Firstly we divide the PUs of the intra-frame into different layers using Hasse diagram and make modification decisions for PUs in each layer respectively to decorrelate the correlated PUs. Secondly we make a statistics on more than 100,000 sampling PU pairs to quantitatively analyze the impacts between the distortions of PUs and then design a distortion function which takes mutual impacts of PUs into account. Experimental results show that our method can significantly reduce the embedding distortion and improve the security compared with the existing STC-based steganography methods embedding in IPMs.
在高效视频编码(HEVC)隐写技术中,利用证格编码(STC)最小化嵌入失真时,帧内预测单元的空间相关性给减小嵌入失真带来了很大的挑战。为了解决这个问题,我们提出了一种分层嵌入方案,该方案将信息嵌入到HEVC中4×4帧内预测单元(pu)的内预测模式(ipm)中。首先利用Hasse图将帧内的pu划分为不同的层,并分别对每一层的pu进行修改决策,实现相关pu的去相关。其次,对10万多对采样PU对进行统计,定量分析PU之间的畸变影响,并设计考虑PU相互影响的畸变函数。实验结果表明,与现有的基于stc的ipm嵌入隐写方法相比,该方法可以显著降低嵌入失真,提高安全性。
{"title":"A Layered Embedding-Based Scheme to Cope with Intra-Frame Distortion Drift In IPM-Based HEVC Steganography","authors":"Xiaoqing Jia, Jie Wang, Yongliang Liu, Xiangui Kang, Yun-Qing Shi","doi":"10.1109/ICASSP39728.2021.9413728","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413728","url":null,"abstract":"The spatial correlation of the intra-frame prediction units brings great challenges when minimizing embedding distortions using syndrome-trellis coding (STC) in High Efficiency Video Coding (HEVC) steganography. To solve this problem, we propose a layered embedding scheme which embeds information into the intra-prediction modes (IPMs) of 4×4 intra-frame prediction units (PUs) in HEVC. Firstly we divide the PUs of the intra-frame into different layers using Hasse diagram and make modification decisions for PUs in each layer respectively to decorrelate the correlated PUs. Secondly we make a statistics on more than 100,000 sampling PU pairs to quantitatively analyze the impacts between the distortions of PUs and then design a distortion function which takes mutual impacts of PUs into account. Experimental results show that our method can significantly reduce the embedding distortion and improve the security compared with the existing STC-based steganography methods embedding in IPMs.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114655037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transitive Transfer Sparse Coding for Distant Domain 远距离域的传递转移稀疏编码
Lingtian Feng, Feng Qian, Xin He, Yuqi Fan, H. Cai, Guangmin Hu
The transfer learning between the source and target domain has already achieved significant success in machine learning areas. However, the existing methods can not achieve satisfactory result when solving the two distant domains transfer learning problem. In the worst case, it could lead to the negative transfer. In this paper, we propose a novel framework called transitive transfer sparse coding (TTSC) to solve the two distant domains transfer learning problem. On the one hand, as an extension of the sparse coding, the TTSC framework constructs a robust and high-level dictionary across three different domains and simultaneously obtains three good feature sparse representations. On the other hand, TTSC utilizes the intermediate domain as a strong bridge to transfer valuable knowledge between the source domain and target domain. Empirical studies validated that the TTSC framework significantly could outperform state-of-the-art methods.
源域和目标域之间的迁移学习已经在机器学习领域取得了显著的成功。然而,现有的方法在解决两远域迁移学习问题时并不能取得令人满意的效果。在最坏的情况下,它可能导致负转移。在本文中,我们提出了一个新的框架,称为传递转移稀疏编码(TTSC)来解决两远域迁移学习问题。一方面,作为稀疏编码的扩展,TTSC框架构建了一个跨三个不同域的鲁棒高阶字典,同时获得了三个良好的特征稀疏表示;另一方面,TTSC利用中间领域作为强大的桥梁,在源领域和目标领域之间传递有价值的知识。实证研究证实,TTSC框架显著优于最先进的方法。
{"title":"Transitive Transfer Sparse Coding for Distant Domain","authors":"Lingtian Feng, Feng Qian, Xin He, Yuqi Fan, H. Cai, Guangmin Hu","doi":"10.1109/ICASSP39728.2021.9415021","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9415021","url":null,"abstract":"The transfer learning between the source and target domain has already achieved significant success in machine learning areas. However, the existing methods can not achieve satisfactory result when solving the two distant domains transfer learning problem. In the worst case, it could lead to the negative transfer. In this paper, we propose a novel framework called transitive transfer sparse coding (TTSC) to solve the two distant domains transfer learning problem. On the one hand, as an extension of the sparse coding, the TTSC framework constructs a robust and high-level dictionary across three different domains and simultaneously obtains three good feature sparse representations. On the other hand, TTSC utilizes the intermediate domain as a strong bridge to transfer valuable knowledge between the source domain and target domain. Empirical studies validated that the TTSC framework significantly could outperform state-of-the-art methods.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114645949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An Empirical Study on Task-Oriented Dialogue Translation 任务导向对话翻译的实证研究
Siyou Liu
Translating conversational text, in particular task-oriented dialogues, is an important application task for machine translation technology. However, it has so far not been extensively explored due to its inherent characteristics including data limitation, discourse, informality and personality. In this paper, we systematically investigate advanced models on the task-oriented dialogue translation task, including sentence-level, document-level and non-autoregressive NMT models. Be-sides, we explore existing techniques such as data selection, back/forward translation, larger batch learning, finetuning and domain adaptation. To alleviate low-resource problem, we transfer general knowledge from four different pre-training models to the downstream task. Encouragingly, we find that the best model with mBART pre-training pushes the SOTA performance on WMT20 English-German and IWSLT DIALOG Chinese-English datasets up to 62.67 and 23.21 BLEU points, respectively.1
会话文本的翻译,特别是面向任务的对话,是机器翻译技术的重要应用任务。然而,由于其固有的数据局限性、话语性、非正式性和个性等特点,至今尚未得到广泛的探讨。本文系统地研究了面向任务的对话翻译任务的高级模型,包括句子级、文档级和非自回归NMT模型。此外,我们还探索了现有的技术,如数据选择、向后/向前翻译、大批量学习、微调和领域自适应。为了缓解低资源问题,我们将四种不同的预训练模型中的一般知识转移到下游任务中。令人鼓舞的是,我们发现经过mbat预训练的最佳模型在WMT20英语-德语和IWSLT DIALOG汉英数据集上的SOTA性能分别达到了62.67和23.21 BLEU点
{"title":"An Empirical Study on Task-Oriented Dialogue Translation","authors":"Siyou Liu","doi":"10.1109/ICASSP39728.2021.9413521","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413521","url":null,"abstract":"Translating conversational text, in particular task-oriented dialogues, is an important application task for machine translation technology. However, it has so far not been extensively explored due to its inherent characteristics including data limitation, discourse, informality and personality. In this paper, we systematically investigate advanced models on the task-oriented dialogue translation task, including sentence-level, document-level and non-autoregressive NMT models. Be-sides, we explore existing techniques such as data selection, back/forward translation, larger batch learning, finetuning and domain adaptation. To alleviate low-resource problem, we transfer general knowledge from four different pre-training models to the downstream task. Encouragingly, we find that the best model with mBART pre-training pushes the SOTA performance on WMT20 English-German and IWSLT DIALOG Chinese-English datasets up to 62.67 and 23.21 BLEU points, respectively.1","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121792186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpolation of Irregularly Sampled Frequency Response Functions Using Convolutional Neural Networks 不规则采样频率响应函数的卷积神经网络插值
M. Acerbi, R. Malvermi, Mirco Pezzoli, F. Antonacci, A. Sarti, R. Corradi
In the field of structural mechanics, classical methods for the vibrational characterization of objects exploit the inherent redundancy of a relevant amount of measurements acquired over regular sampling grids. However, there are cases in which parts of the objects under analysis are not accessible with sensors, leading to irregular sampling grids characterized by holes. Recent works have proved the benefits of adding prior knowledge in these scenarios, either through the definition of a suitable decomposition or using Finite Element modelling. In this paper we propose to use Convolutional Autoencoders (CA) for Frequency Response Function (FRF) interpolation from grids with different subsampling schemes. CA learn a compressed representation from a dataset of FRFs synthetized through Finite Element Analysis. Experiments with numerical and experimental data show the effectiveness of the model with a different amount of missing data and its ability to predict real FRFs characterized by different damping and sampling frequency.
在结构力学领域,用于物体振动表征的经典方法利用了在规则采样网格上获得的相关测量量的固有冗余。然而,在某些情况下,被分析对象的某些部分无法使用传感器,从而导致以孔洞为特征的不规则采样网格。最近的工作已经证明了在这些场景中添加先验知识的好处,无论是通过定义合适的分解还是使用有限元建模。在本文中,我们提出使用卷积自编码器(CA)对具有不同子采样方案的网格进行频响函数(FRF)插值。CA从通过有限元分析合成的frf数据集中学习压缩表示。通过数值和实验数据验证了该模型在不同缺失数据量下的有效性以及对不同阻尼和采样频率下真实频响的预测能力。
{"title":"Interpolation of Irregularly Sampled Frequency Response Functions Using Convolutional Neural Networks","authors":"M. Acerbi, R. Malvermi, Mirco Pezzoli, F. Antonacci, A. Sarti, R. Corradi","doi":"10.1109/ICASSP39728.2021.9413458","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413458","url":null,"abstract":"In the field of structural mechanics, classical methods for the vibrational characterization of objects exploit the inherent redundancy of a relevant amount of measurements acquired over regular sampling grids. However, there are cases in which parts of the objects under analysis are not accessible with sensors, leading to irregular sampling grids characterized by holes. Recent works have proved the benefits of adding prior knowledge in these scenarios, either through the definition of a suitable decomposition or using Finite Element modelling. In this paper we propose to use Convolutional Autoencoders (CA) for Frequency Response Function (FRF) interpolation from grids with different subsampling schemes. CA learn a compressed representation from a dataset of FRFs synthetized through Finite Element Analysis. Experiments with numerical and experimental data show the effectiveness of the model with a different amount of missing data and its ability to predict real FRFs characterized by different damping and sampling frequency.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116633706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Incorporate Maximum Mean Discrepancy in Recurrent Latent Space for Sequential Generative Model 在序列生成模型的循环潜空间中引入最大均值差异
Yuchi Zhang, Yongliang Wang, Yang Dong
Stochastic recurrent neural networks have shown promising performance for modeling complex sequences. Nonetheless, existing methods adopt KL divergence as distribution regularizations in their latent spaces, which limits the choices of models for latent distribution construction. In this paper, we incorporate maximum mean discrepancy in the recurrent structure for distribution regularization. Maximum mean discrepancy is able to measure the difference between two distributions by just sampling from them, which enables us to construct more complicated latent distributions by neural networks. Therefore, our proposed algorithm is able to model more complex sequences. Experiments conducted on two different sequential modeling tasks show that our method outperforms the state-of-the-art sequential modeling algorithms.
随机递归神经网络在复杂序列建模方面表现出良好的性能。然而,现有方法在潜在空间中采用KL散度作为分布正则化,这限制了潜在分布构建模型的选择。在本文中,我们在循环结构中加入了最大均值差来进行分布正则化。最大均值差异可以通过采样来测量两个分布之间的差异,这使我们能够通过神经网络构建更复杂的潜在分布。因此,我们提出的算法能够模拟更复杂的序列。在两个不同的顺序建模任务上进行的实验表明,我们的方法优于最先进的顺序建模算法。
{"title":"Incorporate Maximum Mean Discrepancy in Recurrent Latent Space for Sequential Generative Model","authors":"Yuchi Zhang, Yongliang Wang, Yang Dong","doi":"10.1109/ICASSP39728.2021.9414580","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414580","url":null,"abstract":"Stochastic recurrent neural networks have shown promising performance for modeling complex sequences. Nonetheless, existing methods adopt KL divergence as distribution regularizations in their latent spaces, which limits the choices of models for latent distribution construction. In this paper, we incorporate maximum mean discrepancy in the recurrent structure for distribution regularization. Maximum mean discrepancy is able to measure the difference between two distributions by just sampling from them, which enables us to construct more complicated latent distributions by neural networks. Therefore, our proposed algorithm is able to model more complex sequences. Experiments conducted on two different sequential modeling tasks show that our method outperforms the state-of-the-art sequential modeling algorithms.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116928570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mixture of Informed Experts for Multilingual Speech Recognition 多语言语音识别的知情专家混合
Neeraj Gaur, B. Farris, Parisa Haghani, Isabel Leal, Pedro J. Moreno, Manasa Prasad, B. Ramabhadran, Yun Zhu
When trained on related or low-resource languages, multilingual speech recognition models often outperform their monolingual counterparts. However, these models can suffer from loss in performance for high resource or unrelated languages. We investigate the use of a mixture-of-experts approach to assign per-language parameters in the model to increase network capacity in a structured fashion. We introduce a novel variant of this approach, ‘informed experts’, which attempts to tackle inter-task conflicts by eliminating gradients from other tasks in these task-specific parameters. We conduct experiments on a real-world task with English, French and four dialects of Arabic to show the effectiveness of our approach. Our model matches or outperforms the monolingual models for almost all languages, with gains of as much as 31% relative. Our model also outperforms the baseline multilingual model for all languages by up to 9% relative.
当对相关语言或低资源语言进行训练时,多语言语音识别模型通常优于单语言语音识别模型。然而,对于高资源或不相关的语言,这些模型可能会遭受性能损失。我们研究了使用混合专家方法来分配模型中的每种语言参数,以结构化的方式增加网络容量。我们介绍了这种方法的一种新变体,“知情专家”,它试图通过消除这些任务特定参数中其他任务的梯度来解决任务间冲突。我们用英语、法语和四种阿拉伯语方言在现实世界的任务中进行了实验,以证明我们方法的有效性。我们的模型与几乎所有语言的单语模型相匹配或优于单语模型,相对收益高达31%。我们的模型也比所有语言的基准多语言模型相对高出9%。
{"title":"Mixture of Informed Experts for Multilingual Speech Recognition","authors":"Neeraj Gaur, B. Farris, Parisa Haghani, Isabel Leal, Pedro J. Moreno, Manasa Prasad, B. Ramabhadran, Yun Zhu","doi":"10.1109/ICASSP39728.2021.9414379","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414379","url":null,"abstract":"When trained on related or low-resource languages, multilingual speech recognition models often outperform their monolingual counterparts. However, these models can suffer from loss in performance for high resource or unrelated languages. We investigate the use of a mixture-of-experts approach to assign per-language parameters in the model to increase network capacity in a structured fashion. We introduce a novel variant of this approach, ‘informed experts’, which attempts to tackle inter-task conflicts by eliminating gradients from other tasks in these task-specific parameters. We conduct experiments on a real-world task with English, French and four dialects of Arabic to show the effectiveness of our approach. Our model matches or outperforms the monolingual models for almost all languages, with gains of as much as 31% relative. Our model also outperforms the baseline multilingual model for all languages by up to 9% relative.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"12 17","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120936889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Extended Object Tracking With Automotive Radar Using B-Spline Chained Ellipses Model 基于b样条链椭圆模型的汽车雷达扩展目标跟踪
G. Yao, P. Wang, K. Berntorp, Hassan Mansour, P. Boufounos
This paper introduces a B-spline chained ellipses model representation for extended object tracking (EOT) using high-resolution automotive radar measurements. With offline automotive radar training datasets, the proposed model parameters are learned using the expectation-maximization (EM) algorithm. Then the probabilistic multi-hypothesis tracking (PMHT) along with the unscented transform (UT) is proposed to deal with the nonlinear forward-warping coordinate transformation, the measurement-to-ellipsis association, and the state update step. Numerical validation is provided to verify the effectiveness of the proposed EOT framework with automotive radar measurements.
介绍了一种基于高分辨率汽车雷达测量的扩展目标跟踪(EOT)的b样条链椭圆模型表示。对于离线汽车雷达训练数据集,使用期望最大化(EM)算法学习所提出的模型参数。然后提出了概率多假设跟踪(PMHT)和无气味变换(UT)来处理非线性前向弯曲坐标变换、测量-椭圆关联和状态更新步骤。通过汽车雷达测量,对所提出的EOT框架进行了数值验证。
{"title":"Extended Object Tracking With Automotive Radar Using B-Spline Chained Ellipses Model","authors":"G. Yao, P. Wang, K. Berntorp, Hassan Mansour, P. Boufounos","doi":"10.1109/ICASSP39728.2021.9415080","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9415080","url":null,"abstract":"This paper introduces a B-spline chained ellipses model representation for extended object tracking (EOT) using high-resolution automotive radar measurements. With offline automotive radar training datasets, the proposed model parameters are learned using the expectation-maximization (EM) algorithm. Then the probabilistic multi-hypothesis tracking (PMHT) along with the unscented transform (UT) is proposed to deal with the nonlinear forward-warping coordinate transformation, the measurement-to-ellipsis association, and the state update step. Numerical validation is provided to verify the effectiveness of the proposed EOT framework with automotive radar measurements.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121107666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1