首页 > 最新文献

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Wishart Localization Prior On Spatial Covariance Matrix In Ambisonic Source Separation Using Non-Negative Tensor Factorization 非负张量分解双声源分离中空间协方差矩阵的Wishart定位先验
Mateusz Guzik, K. Kowalczyk
This paper presents an extension of the existing Non-negative Tensor Factorization (NTF) based method for sound source separation under reverberant conditions, formulated for Ambisonic microphone mixture signals. In particular, we address the problem of optimal exploitation of the prior knowledge concerning the source localization, through the formulation of a suitable Maximum a Posteriori (MAP) framework. Within the presented approach, the magnitude spectrograms are modelled by the NTF and the individual source Spatial Covariance Matrices (SCM) are approximated as a sum of anechoic Spherical Harmonic (SH) components, weighted with the so-called spatial selector. We constrain the SCM using the Wishart distribution, which leads to a new posterior probability and in turn to the derivation of the extended update rules. The proposed solution avoids the issues encountered in the original method, related to the empirical binary initialization strategy for the spatial selector weights, which due to multiplicative update rules may result in sound coming from certain directions not being taken into account. The proposed method is evaluated against the original algorithm and another recently proposed Expectation Maximization (EM) algorithm that also incorporates a spatial localization prior, showing improved separation performance in experiments with first-order Ambisonic recordings of musical instruments and speech utterances.
本文提出了现有的基于非负张量分解(NTF)的混响条件下声源分离方法的扩展,该方法是针对双声传声器混合信号制定的。特别是,我们通过制定合适的最大后验(MAP)框架,解决了有关源定位的先验知识的最优利用问题。在该方法中,幅度谱图由NTF建模,单个源空间协方差矩阵(SCM)近似为消声球谐(SH)分量的和,并使用所谓的空间选择器进行加权。我们使用Wishart分布约束SCM,从而得到一个新的后验概率,进而推导出扩展的更新规则。提出的解决方案避免了原始方法中遇到的问题,即空间选择器权重的经验二进制初始化策略,由于乘法更新规则可能导致来自某些方向的声音不被考虑在内。将该方法与原始算法和另一种最近提出的期望最大化(EM)算法进行了比较,该算法也包含了空间定位先验,在一阶乐器和语音录音的实验中显示出更好的分离性能。
{"title":"Wishart Localization Prior On Spatial Covariance Matrix In Ambisonic Source Separation Using Non-Negative Tensor Factorization","authors":"Mateusz Guzik, K. Kowalczyk","doi":"10.1109/ICASSP43922.2022.9746222","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9746222","url":null,"abstract":"This paper presents an extension of the existing Non-negative Tensor Factorization (NTF) based method for sound source separation under reverberant conditions, formulated for Ambisonic microphone mixture signals. In particular, we address the problem of optimal exploitation of the prior knowledge concerning the source localization, through the formulation of a suitable Maximum a Posteriori (MAP) framework. Within the presented approach, the magnitude spectrograms are modelled by the NTF and the individual source Spatial Covariance Matrices (SCM) are approximated as a sum of anechoic Spherical Harmonic (SH) components, weighted with the so-called spatial selector. We constrain the SCM using the Wishart distribution, which leads to a new posterior probability and in turn to the derivation of the extended update rules. The proposed solution avoids the issues encountered in the original method, related to the empirical binary initialization strategy for the spatial selector weights, which due to multiplicative update rules may result in sound coming from certain directions not being taken into account. The proposed method is evaluated against the original algorithm and another recently proposed Expectation Maximization (EM) algorithm that also incorporates a spatial localization prior, showing improved separation performance in experiments with first-order Ambisonic recordings of musical instruments and speech utterances.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"250 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133902049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Patch Steganalysis: A Sampling Based Defense Against Adversarial Steganography 补丁隐写分析:基于采样的对抗性隐写防御
Chuan Qin, Na Zhao, Weiming Zhang, Nenghai Yu
In recent years, the classification accuracy of CNN (convolutional neural network) steganalyzers has rapidly improved. However, as general CNN classifiers will misclassify adversarial samples, CNN steganalyzers can hardly detect adversarial steganography, which combines adversarial samples and steganography. Adversarial training and preprocessing are two effective methods to defend against adversarial samples. But literature shows adversarial training is ineffective for adversarial steganography. Steganographic modifications will also be destroyed by preprocessing, which aims to wipe out adversarial perturbations. In this paper, we propose a novel sampling based defense method for steganalysis. Specifically, by sampling image patches, CNN steganalyzers can bypass the sparse adversarial perturbations and extract effective features. Additionally, by calculating statistical vectors and regrouping deep features, the impact on the classification accuracy of common samples is effectively compressed. The experiments show that the proposed method can significantly improve the robustness against adversarial steganography without adversarial training.
近年来,CNN(卷积神经网络)隐写分析仪的分类精度得到了迅速提高。然而,由于一般的CNN分类器会对对抗样本进行错误分类,CNN隐写分析仪很难检测到对抗样本和隐写相结合的对抗隐写。对抗训练和预处理是对抗样本的两种有效方法。但文献表明,对抗性训练对对抗性隐写是无效的。隐写修改也将被预处理破坏,其目的是消除对抗性扰动。本文提出了一种新的基于采样的隐写防御方法。具体来说,CNN隐写分析仪通过对图像patch进行采样,可以绕过稀疏的对抗性扰动,提取有效特征。此外,通过统计向量的计算和深度特征的重新分组,有效地压缩了对常见样本分类精度的影响。实验表明,该方法可以显著提高对对抗隐写的鲁棒性,无需对抗训练。
{"title":"Patch Steganalysis: A Sampling Based Defense Against Adversarial Steganography","authors":"Chuan Qin, Na Zhao, Weiming Zhang, Nenghai Yu","doi":"10.1109/icassp43922.2022.9747638","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747638","url":null,"abstract":"In recent years, the classification accuracy of CNN (convolutional neural network) steganalyzers has rapidly improved. However, as general CNN classifiers will misclassify adversarial samples, CNN steganalyzers can hardly detect adversarial steganography, which combines adversarial samples and steganography. Adversarial training and preprocessing are two effective methods to defend against adversarial samples. But literature shows adversarial training is ineffective for adversarial steganography. Steganographic modifications will also be destroyed by preprocessing, which aims to wipe out adversarial perturbations. In this paper, we propose a novel sampling based defense method for steganalysis. Specifically, by sampling image patches, CNN steganalyzers can bypass the sparse adversarial perturbations and extract effective features. Additionally, by calculating statistical vectors and regrouping deep features, the impact on the classification accuracy of common samples is effectively compressed. The experiments show that the proposed method can significantly improve the robustness against adversarial steganography without adversarial training.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131834370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Designing a QAM Signal Detector for Massive Mimo Systems via PS-ADMM Approach 基于PS-ADMM方法的大规模Mimo系统QAM信号检测器设计
Quan Zhang, Xuyang Zhao, Jiangtao Wang, Yongchao Wang
This paper presents an efficient quadrature amplitude modulation (QAM) signal detector for massive multiple-input multiple-output (MIMO) communication systems via the penalty-sharing alternating direction method of multipliers (PS-ADMM). The content of the paper is summarized as follows: first, we formulate QAM-MIMO detection as a maximum-likelihood optimization problem with bound relaxation constraints. Decomposing QAM signals into a sum of multiple binary variables and exploiting introduced binary variables as penalty functions, we transform the detection optimization model to a non-convex sharing problem; second, a customized ADMM algorithm is presented to solve the formulated non-convex optimization problem. In the implementation, all variables can be solved analytically and in parallel; third, it is proved that the proposed PS-ADMM algorithm converges under mild conditions. Simulation results demonstrate the effectiveness of the proposed approach.
提出了一种适用于大规模多输入多输出(MIMO)通信系统的高效正交调幅(QAM)信号检测器,该检测器采用乘法器的罚分交替方向法(PS-ADMM)。本文的内容总结如下:首先,我们将QAM-MIMO检测表述为一个有界松弛约束的最大似然优化问题。将QAM信号分解为多个二元变量的和,并利用引入的二元变量作为惩罚函数,将检测优化模型转化为非凸共享问题;其次,提出了一种自定义ADMM算法来解决公式化的非凸优化问题。在实现中,所有变量都可以解析求解和并行求解;第三,证明了PS-ADMM算法在温和条件下的收敛性。仿真结果验证了该方法的有效性。
{"title":"Designing a QAM Signal Detector for Massive Mimo Systems via PS-ADMM Approach","authors":"Quan Zhang, Xuyang Zhao, Jiangtao Wang, Yongchao Wang","doi":"10.1109/icassp43922.2022.9747281","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747281","url":null,"abstract":"This paper presents an efficient quadrature amplitude modulation (QAM) signal detector for massive multiple-input multiple-output (MIMO) communication systems via the penalty-sharing alternating direction method of multipliers (PS-ADMM). The content of the paper is summarized as follows: first, we formulate QAM-MIMO detection as a maximum-likelihood optimization problem with bound relaxation constraints. Decomposing QAM signals into a sum of multiple binary variables and exploiting introduced binary variables as penalty functions, we transform the detection optimization model to a non-convex sharing problem; second, a customized ADMM algorithm is presented to solve the formulated non-convex optimization problem. In the implementation, all variables can be solved analytically and in parallel; third, it is proved that the proposed PS-ADMM algorithm converges under mild conditions. Simulation results demonstrate the effectiveness of the proposed approach.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131836901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Point-Mass Filter with Decomposition of Transient Density 瞬态密度分解的点质量滤波器
P. Tichavský, O. Straka, J. Duník
The paper deals with the state estimation of nonlinear stochastic dynamic systems with special attention on a grid-based numerical solution to the Bayesian recursive relations, the point-mass filter (PMF). In the paper, a novel functional decomposition of the transient density describing the system dynamics is proposed. The decomposition is based on a non-negative matrix factorization and separates the density into functions of the future and current states. Such decomposition facilitates a thrifty calculation of the convolution, which is a bottleneck of the PMF performance. The PMF estimate quality and computational costs can be efficiently controlled by choosing an appropriate rank of the decomposition. The performance of the PMF with the transient density decomposition is illustrated in a terrain-aided navigation scenario.
本文研究了非线性随机动力系统的状态估计问题,重点研究了基于网格的贝叶斯递推关系的数值解——点质量滤波器。本文提出了一种描述系统动力学的暂态密度函数分解方法。该分解基于非负矩阵分解,并将密度分为未来状态和当前状态的函数。这样的分解有利于节约卷积计算,这是PMF性能的瓶颈。通过选择合适的分解等级,可以有效地控制PMF估计质量和计算成本。在地形辅助导航场景下,对瞬态密度分解PMF的性能进行了分析。
{"title":"Point-Mass Filter with Decomposition of Transient Density","authors":"P. Tichavský, O. Straka, J. Duník","doi":"10.1109/icassp43922.2022.9747607","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747607","url":null,"abstract":"The paper deals with the state estimation of nonlinear stochastic dynamic systems with special attention on a grid-based numerical solution to the Bayesian recursive relations, the point-mass filter (PMF). In the paper, a novel functional decomposition of the transient density describing the system dynamics is proposed. The decomposition is based on a non-negative matrix factorization and separates the density into functions of the future and current states. Such decomposition facilitates a thrifty calculation of the convolution, which is a bottleneck of the PMF performance. The PMF estimate quality and computational costs can be efficiently controlled by choosing an appropriate rank of the decomposition. The performance of the PMF with the transient density decomposition is illustrated in a terrain-aided navigation scenario.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129391911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Multitask Learning Framework for Speaker Change Detection with Content Information from Unsupervised Speech Decomposition 基于无监督语音分解内容信息的说话人变化检测多任务学习框架
Hang Su, Danyang Zhao, Long Dang, Minglei Li, Xixin Wu, Xunying Liu, Helen M. Meng
Speaker Change Detection (SCD) is a task of determining the time boundaries between speech segments of different speakers. SCD system can be applied to many tasks, such as speaker diarization, speaker tracking, and transcribing audio with multiple speakers. Recent advancements in deep learning lead to approaches that can directly detect the speaker change points from audio data at the frame-level based on neural network models. These approaches may be further improved by utilizing speaker information in the training data, and utilizing content information extracted in an unsupervised manner. This work proposes a novel framework for the SCD task, which utilizes a multitask learning architecture to leverage speaker information during the training stage, and adds the content information extracted from an unsupervised speech decomposition model to help detect the speaker change points. Experiment results show that the architecture of multitask learning with speaker information can improve the performance of SCD, and adding content information extracted from unsupervised speech decomposition model can further improve the performance. To the best of our knowledge, this work outperforms the state-of-the-art SCD results [1] on the AMI dataset.
说话人变化检测(SCD)是一项确定不同说话人的语音段之间的时间边界的任务。SCD系统可以应用于许多任务,如扬声器拨号,扬声器跟踪,并转录音频与多个扬声器。深度学习的最新进展导致了基于神经网络模型的方法,可以直接从帧级的音频数据中检测说话人的变化点。这些方法可以通过利用训练数据中的说话人信息和利用以无监督方式提取的内容信息来进一步改进。本文提出了一种新的SCD任务框架,该框架利用多任务学习架构在训练阶段利用说话人信息,并添加从无监督语音分解模型中提取的内容信息来帮助检测说话人的变化点。实验结果表明,基于说话人信息的多任务学习架构可以提高SCD的性能,添加从无监督语音分解模型中提取的内容信息可以进一步提高SCD的性能。据我们所知,这项工作优于AMI数据集上最先进的SCD结果[1]。
{"title":"A Multitask Learning Framework for Speaker Change Detection with Content Information from Unsupervised Speech Decomposition","authors":"Hang Su, Danyang Zhao, Long Dang, Minglei Li, Xixin Wu, Xunying Liu, Helen M. Meng","doi":"10.1109/icassp43922.2022.9746116","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746116","url":null,"abstract":"Speaker Change Detection (SCD) is a task of determining the time boundaries between speech segments of different speakers. SCD system can be applied to many tasks, such as speaker diarization, speaker tracking, and transcribing audio with multiple speakers. Recent advancements in deep learning lead to approaches that can directly detect the speaker change points from audio data at the frame-level based on neural network models. These approaches may be further improved by utilizing speaker information in the training data, and utilizing content information extracted in an unsupervised manner. This work proposes a novel framework for the SCD task, which utilizes a multitask learning architecture to leverage speaker information during the training stage, and adds the content information extracted from an unsupervised speech decomposition model to help detect the speaker change points. Experiment results show that the architecture of multitask learning with speaker information can improve the performance of SCD, and adding content information extracted from unsupervised speech decomposition model can further improve the performance. To the best of our knowledge, this work outperforms the state-of-the-art SCD results [1] on the AMI dataset.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130727519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Two-Stage Contrastive Learning Framework For Imbalanced Aerial Scene Recognition 不平衡航拍场景识别的两阶段对比学习框架
Lexing Huang, Senlin Cai, Yihong Zhuang, Changxing Jing, Yue Huang, Xiaotong Tu, Xinghao Ding
In real-world scenarios, aerial image datasets are generally class imbalanced, where the majority classes have rich samples, while the minority classes only have a few samples. Such class imbalanced datasets bring great challenges to aerial scene recognition. In this paper, we explore a novel two-stage contrastive learning framework, which aims to take care of representation learning and classifier learning, thereby boosting aerial scene recognition. Specifically, in the representation learning stage, we design a data augmentation policy to improve the potential of contrastive learning according to the characteristics of aerial images. And we employ supervised contrastive learning to learn the association between aerial images of the same scene. In the classification learning stage, we fix the encoder to maintain good representation and use the re-balancing strategy to train a less biased classifier. A variety of experimental results on the imbalanced aerial image datasets show the advantages of the proposed two-stage contrastive learning framework for the imbalanced aerial scene recognition.
在现实场景中,航空图像数据集通常是类不平衡的,其中大多数类具有丰富的样本,而少数类只有少量样本。这种类不平衡的数据集给航拍场景识别带来了巨大的挑战。在本文中,我们探索了一种新的两阶段对比学习框架,该框架旨在照顾表征学习和分类器学习,从而提高航拍场景识别。具体来说,在表征学习阶段,我们根据航拍图像的特点设计了一种数据增强策略来提高对比学习的潜力。我们采用监督对比学习来学习同一场景的航拍图像之间的关联。在分类学习阶段,我们修复编码器以保持良好的表征,并使用重新平衡策略来训练较小偏差的分类器。在不平衡航拍图像数据集上的各种实验结果表明,本文提出的两阶段对比学习框架在不平衡航拍场景识别中的优势。
{"title":"A Two-Stage Contrastive Learning Framework For Imbalanced Aerial Scene Recognition","authors":"Lexing Huang, Senlin Cai, Yihong Zhuang, Changxing Jing, Yue Huang, Xiaotong Tu, Xinghao Ding","doi":"10.1109/icassp43922.2022.9746248","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746248","url":null,"abstract":"In real-world scenarios, aerial image datasets are generally class imbalanced, where the majority classes have rich samples, while the minority classes only have a few samples. Such class imbalanced datasets bring great challenges to aerial scene recognition. In this paper, we explore a novel two-stage contrastive learning framework, which aims to take care of representation learning and classifier learning, thereby boosting aerial scene recognition. Specifically, in the representation learning stage, we design a data augmentation policy to improve the potential of contrastive learning according to the characteristics of aerial images. And we employ supervised contrastive learning to learn the association between aerial images of the same scene. In the classification learning stage, we fix the encoder to maintain good representation and use the re-balancing strategy to train a less biased classifier. A variety of experimental results on the imbalanced aerial image datasets show the advantages of the proposed two-stage contrastive learning framework for the imbalanced aerial scene recognition.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"87 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130922722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Convex Clustering for Autocorrelated Time Series 自相关时间序列的凸聚类
Max Revay, V. Solo
While clustering in general is a heavily worked area, clustering of auto-correlated time series (CATS) has received relatively little attention. Here, we develop a convex clustering algorithm suited to auto-correlated time series and compare it with a state of the art method. We find the proposed algorithm is able to more accurately identify the true clusters.
虽然聚类通常是一个研究较多的领域,但自相关时间序列(CATS)的聚类却很少受到关注。在这里,我们开发了一种适合于自相关时间序列的凸聚类算法,并将其与最先进的方法进行了比较。我们发现该算法能够更准确地识别出真实的聚类。
{"title":"Convex Clustering for Autocorrelated Time Series","authors":"Max Revay, V. Solo","doi":"10.1109/icassp43922.2022.9747143","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747143","url":null,"abstract":"While clustering in general is a heavily worked area, clustering of auto-correlated time series (CATS) has received relatively little attention. Here, we develop a convex clustering algorithm suited to auto-correlated time series and compare it with a state of the art method. We find the proposed algorithm is able to more accurately identify the true clusters.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131192679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Domain Adaptation for Speaker Recognition in Singing and Spoken Voice 歌唱和口语中说话人识别的域自适应
Anurag Chowdhury, Austin Cozzo, A. Ross
In this work, we study the effect of speaking style and audio condition variability between the spoken and singing voice on speaker recognition performance. Furthermore, we also explore the utility of domain adaptation for bridging the gap between multiple speaking styles (singing versus spoken) and improving overall speaker recognition performance. In that regard, we first extend a publicly available singing voice dataset, JukeBox, with corresponding spoken voice data and refer to it as JukeBox-V2. Next, we use domain adaptation for developing a speaker recognition method robust to varying speaking styles and audio conditions. Finally, we analyze the speech embeddings of domain-adapted models to explain their generalizability across varying speaking styles and audio conditions.
在这项工作中,我们研究了说话风格和说话和唱歌之间的音频条件变化对说话人识别性能的影响。此外,我们还探讨了领域自适应在弥合多种说话风格(唱歌与口语)之间的差距和提高整体说话人识别性能方面的应用。在这方面,我们首先扩展了一个公开可用的歌唱声音数据集JukeBox,并使用相应的语音数据,并将其称为JukeBox- v2。接下来,我们使用域自适应来开发一种对不同说话风格和音频条件具有鲁棒性的说话人识别方法。最后,我们分析了领域适应模型的语音嵌入,以解释它们在不同说话风格和音频条件下的泛化性。
{"title":"Domain Adaptation for Speaker Recognition in Singing and Spoken Voice","authors":"Anurag Chowdhury, Austin Cozzo, A. Ross","doi":"10.1109/icassp43922.2022.9746111","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746111","url":null,"abstract":"In this work, we study the effect of speaking style and audio condition variability between the spoken and singing voice on speaker recognition performance. Furthermore, we also explore the utility of domain adaptation for bridging the gap between multiple speaking styles (singing versus spoken) and improving overall speaker recognition performance. In that regard, we first extend a publicly available singing voice dataset, JukeBox, with corresponding spoken voice data and refer to it as JukeBox-V2. Next, we use domain adaptation for developing a speaker recognition method robust to varying speaking styles and audio conditions. Finally, we analyze the speech embeddings of domain-adapted models to explain their generalizability across varying speaking styles and audio conditions.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132850386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Non-Convex Proximal Approach for Centroid-Based Classification 一种基于质心的非凸近端分类方法
Mewe-Hezoudah Kahanam, L. Brusquet, Ségolène Martin, J. Pesquet
In this paper, we propose a novel variational approach for supervised classification based on transform learning. Our approach consists of formulating an optimization problem on both the transform matrix and the centroids of the classes in a low-dimensional transformed space. The loss function is based on the distance to the centroids, which can be chosen in a flexible manner. To avoid trivial solutions or highly correlated clusters, our model incorporates a penalty term on the centroids, which encourages them to be separated. The resulting non-convex and non-smooth minimization problem is then solved by a primal-dual alternating minimization strategy. We assess the performance of our method on a bunch of supervised classification problems and compare it to state-of-the-art methods.
本文提出了一种基于变换学习的监督分类变分方法。我们的方法包括在低维变换空间中对变换矩阵和类的质心制定一个优化问题。损失函数基于到质心的距离,可以灵活地选择。为了避免琐碎的解决方案或高度相关的集群,我们的模型在质心上加入了一个惩罚项,这鼓励它们被分开。然后用原始-对偶交替最小化策略求解得到的非凸非光滑最小化问题。我们评估了我们的方法在一系列监督分类问题上的性能,并将其与最先进的方法进行比较。
{"title":"A Non-Convex Proximal Approach for Centroid-Based Classification","authors":"Mewe-Hezoudah Kahanam, L. Brusquet, Ségolène Martin, J. Pesquet","doi":"10.1109/ICASSP43922.2022.9747071","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9747071","url":null,"abstract":"In this paper, we propose a novel variational approach for supervised classification based on transform learning. Our approach consists of formulating an optimization problem on both the transform matrix and the centroids of the classes in a low-dimensional transformed space. The loss function is based on the distance to the centroids, which can be chosen in a flexible manner. To avoid trivial solutions or highly correlated clusters, our model incorporates a penalty term on the centroids, which encourages them to be separated. The resulting non-convex and non-smooth minimization problem is then solved by a primal-dual alternating minimization strategy. We assess the performance of our method on a bunch of supervised classification problems and compare it to state-of-the-art methods.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133156341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TCRNet: Make Transformer, CNN and RNN Complement Each Other TCRNet:使Transformer、CNN和RNN互补
Xinxin Shan, Tai Ma, Anqi Gu, Haibin Cai, Ying Wen
Recently, several Transformer-based methods have been presented to improve image segmentation. However, since Transformer needs regular square images and has difficulty in obtaining local feature information, the performance of image segmentation is seriously affected. In this paper, we propose a novel encoder-decoder network named TCRNet, which makes Transformer, Convolutional neural network (CNN) and Recurrent neural network (RNN) complement each other. In the encoder, we extract and concatenate the feature maps from Transformer and CNN to effectively capture global and local feature information of images. Then in the decoder, we utilize convolutional RNN in the proposed recurrent decoding unit to refine the feature maps from the decoder for finer prediction. Experimental results on three medical datasets demonstrate that TCRNet effectively improves the segmentation precision.
近年来,人们提出了几种基于变换的图像分割方法。但是,由于Transformer需要正则方形图像,难以获得局部特征信息,严重影响了图像分割的性能。在本文中,我们提出了一种新的编码器-解码器网络TCRNet,它使变压器、卷积神经网络(CNN)和循环神经网络(RNN)相互补充。在编码器中,我们从Transformer和CNN中提取并拼接特征映射,有效捕获图像的全局和局部特征信息。然后在解码器中,我们在提出的循环解码单元中使用卷积RNN来改进解码器的特征映射,以进行更精细的预测。在三个医学数据集上的实验结果表明,TCRNet有效地提高了分割精度。
{"title":"TCRNet: Make Transformer, CNN and RNN Complement Each Other","authors":"Xinxin Shan, Tai Ma, Anqi Gu, Haibin Cai, Ying Wen","doi":"10.1109/icassp43922.2022.9747716","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747716","url":null,"abstract":"Recently, several Transformer-based methods have been presented to improve image segmentation. However, since Transformer needs regular square images and has difficulty in obtaining local feature information, the performance of image segmentation is seriously affected. In this paper, we propose a novel encoder-decoder network named TCRNet, which makes Transformer, Convolutional neural network (CNN) and Recurrent neural network (RNN) complement each other. In the encoder, we extract and concatenate the feature maps from Transformer and CNN to effectively capture global and local feature information of images. Then in the decoder, we utilize convolutional RNN in the proposed recurrent decoding unit to refine the feature maps from the decoder for finer prediction. Experimental results on three medical datasets demonstrate that TCRNet effectively improves the segmentation precision.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133174166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1