首页 > 最新文献

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
ASR Error Correction with Dual-Channel Self-Supervised Learning 基于双通道自监督学习的ASR纠错
Fan Zhang, Mei Tu, Song Liu, Jinyao Yan
To improve the performance of Automatic Speech Recognition (ASR), it is common to deploy an error correction module at the post-processing stage to correct recognition errors. In this paper, we propose 1) an error correction model, which takes account of both contextual information and phonetic information by dual-channel; 2) a self-supervised learning method for the model. Firstly, an error region detection model is used to detect the error regions of ASR output. Then, we perform dual-channel feature extraction for the error regions, where one channel extracts their contextual information with a pre-trained language model, while the other channel builds their phonetic information. At the training stage, we construct error patterns at the phoneme level, which simplifies the data annotation procedure, thus allowing us to leverage a large scale of unlabeled data to train our model in a self-supervised learning manner. Experimental results on different test sets demonstrate the effectiveness and robustness of our model.
为了提高自动语音识别(ASR)的性能,通常在后处理阶段部署纠错模块来纠正识别错误。在本文中,我们提出了1)一种双通道同时考虑上下文信息和语音信息的纠错模型;2)模型的自监督学习方法。首先,采用误差区域检测模型对ASR输出的误差区域进行检测。然后,我们对错误区域进行双通道特征提取,其中一个通道使用预训练的语言模型提取其上下文信息,而另一个通道构建其语音信息。在训练阶段,我们在音素层面构建错误模式,这简化了数据标注过程,从而使我们能够利用大量未标记数据以自监督学习的方式训练我们的模型。在不同测试集上的实验结果证明了该模型的有效性和鲁棒性。
{"title":"ASR Error Correction with Dual-Channel Self-Supervised Learning","authors":"Fan Zhang, Mei Tu, Song Liu, Jinyao Yan","doi":"10.1109/icassp43922.2022.9746763","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746763","url":null,"abstract":"To improve the performance of Automatic Speech Recognition (ASR), it is common to deploy an error correction module at the post-processing stage to correct recognition errors. In this paper, we propose 1) an error correction model, which takes account of both contextual information and phonetic information by dual-channel; 2) a self-supervised learning method for the model. Firstly, an error region detection model is used to detect the error regions of ASR output. Then, we perform dual-channel feature extraction for the error regions, where one channel extracts their contextual information with a pre-trained language model, while the other channel builds their phonetic information. At the training stage, we construct error patterns at the phoneme level, which simplifies the data annotation procedure, thus allowing us to leverage a large scale of unlabeled data to train our model in a self-supervised learning manner. Experimental results on different test sets demonstrate the effectiveness and robustness of our model.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"289 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122301850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Scalable Ridge Leverage Score Sampling for the Nyström Method 可扩展的岭杠杆得分抽样Nyström方法
Farah Cherfaoui, H. Kadri, L. Ralaivola
The Nyström method, known as an efficient technique for approximating Gram matrices, builds upon a small subset of the data called landmarks, whose choice impacts the quality of the approximated Gram matrix. Various sampling methods have been proposed in the literature to choose such a subset, among which some based on ridge Leverage scores, which come with good theoretical and practical results. Nevertheless, direct computation of ridge leverage scores has an Θ(n3) computation cost if n is the number of data, which is prohibitive when n is large. To tackle this problem, we here propose a Θ(n) divide-and-conquer (DAC) method to approximate ridge leverage scores and we provide theoretical guarantees and empirical results regarding their ability to blend with the Nyström approximation strategy. Our experimental results show that the proposed approximate leverage score sampling scheme achieves a good trade-off between predictive performance and running time.
Nyström方法被称为近似Gram矩阵的有效技术,它建立在称为地标的小数据子集上,其选择影响近似Gram矩阵的质量。文献中提出了各种抽样方法来选择这样一个子集,其中一些是基于ridge Leverage分数,取得了良好的理论和实践效果。然而,当n为数据数时,直接计算脊杠杆分数的计算成本为Θ(n3),当n较大时,这是令人望而却步的。为了解决这个问题,我们在这里提出了一个Θ(n)分治(DAC)方法来近似岭杠杆分数,我们提供了关于它们与Nyström近似策略混合的能力的理论保证和经验结果。实验结果表明,所提出的近似杠杆分数抽样方案在预测性能和运行时间之间取得了很好的平衡。
{"title":"Scalable Ridge Leverage Score Sampling for the Nyström Method","authors":"Farah Cherfaoui, H. Kadri, L. Ralaivola","doi":"10.1109/icassp43922.2022.9747039","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747039","url":null,"abstract":"The Nyström method, known as an efficient technique for approximating Gram matrices, builds upon a small subset of the data called landmarks, whose choice impacts the quality of the approximated Gram matrix. Various sampling methods have been proposed in the literature to choose such a subset, among which some based on ridge Leverage scores, which come with good theoretical and practical results. Nevertheless, direct computation of ridge leverage scores has an Θ(n3) computation cost if n is the number of data, which is prohibitive when n is large. To tackle this problem, we here propose a Θ(n) divide-and-conquer (DAC) method to approximate ridge leverage scores and we provide theoretical guarantees and empirical results regarding their ability to blend with the Nyström approximation strategy. Our experimental results show that the proposed approximate leverage score sampling scheme achieves a good trade-off between predictive performance and running time.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122324653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
User Scheduling Using Graph Neural Networks for Reconfigurable Intelligent Surface Assisted Multiuser Downlink Communications 基于图神经网络的可重构智能曲面辅助多用户下行通信用户调度
Zhong Zhang, Tao Jiang, Weiyong Yu
Reconfigurable intelligent surface (RIS) is capable of intelligently manipulating the phases of the incident electromagnetic wave to improve the wireless propagation environment between the base station (BS) and the users. This paper addresses the joint user scheduling, RIS configuration, and BS beamforming problem in an RIS-assisted downlink network with limited pilot overhead. We show that graph neural networks (GNN) with permutation invariance and equivariance properties can be used to appropriately schedule users and to design RIS configurations to achieve high overall throughput while accounting for fairness among the users. As compared to the conventional methodology of first estimating the channels then optimizing the user schedule, RIS configuration and the beamformers, this paper shows that an optimized user schedule can be obtained directly from a very short set of pilots using a GNN, then the RIS configuration can be optimized using a second GNN, and finally BS beamformers can be designed based on the overall effective channel. Numerical results show that the proposed approach can utilize received pilots more efficiently than conventional channel estimation based approach.
可重构智能面(RIS)能够智能地控制入射电磁波的相位,以改善基站与用户之间的无线传播环境。本文研究了在有限导频开销下,RIS辅助下行网络中的联合用户调度、RIS配置和BS波束形成问题。我们证明了具有排列不变性和等方差特性的图神经网络(GNN)可以用于适当地调度用户和设计RIS配置,以实现高总体吞吐量,同时考虑到用户之间的公平性。与传统的先估计信道然后优化用户调度、RIS配置和波束形成器的方法相比,本文表明,利用GNN可以直接从极短的导频组中获得优化的用户调度,然后利用第二个GNN优化RIS配置,最后根据整体有效信道设计BS波束形成器。数值结果表明,该方法比传统的基于信道估计的方法更有效地利用了接收到的导频。
{"title":"User Scheduling Using Graph Neural Networks for Reconfigurable Intelligent Surface Assisted Multiuser Downlink Communications","authors":"Zhong Zhang, Tao Jiang, Weiyong Yu","doi":"10.1109/ICASSP43922.2022.9746441","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9746441","url":null,"abstract":"Reconfigurable intelligent surface (RIS) is capable of intelligently manipulating the phases of the incident electromagnetic wave to improve the wireless propagation environment between the base station (BS) and the users. This paper addresses the joint user scheduling, RIS configuration, and BS beamforming problem in an RIS-assisted downlink network with limited pilot overhead. We show that graph neural networks (GNN) with permutation invariance and equivariance properties can be used to appropriately schedule users and to design RIS configurations to achieve high overall throughput while accounting for fairness among the users. As compared to the conventional methodology of first estimating the channels then optimizing the user schedule, RIS configuration and the beamformers, this paper shows that an optimized user schedule can be obtained directly from a very short set of pilots using a GNN, then the RIS configuration can be optimized using a second GNN, and finally BS beamformers can be designed based on the overall effective channel. Numerical results show that the proposed approach can utilize received pilots more efficiently than conventional channel estimation based approach.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"231 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122392105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Semi-Supervised Standardized Detection of Periodic Signals with Application to Exoplanet Detection 周期信号的半监督标准化检测及其在系外行星探测中的应用
S. Sulis, D. Mary, L. Bigot
We propose a numerical methodology for detecting periodicities in unknown colored noise and for evaluating the ‘significance levels’ (p-values) of the test statistics. The procedure assumes and leverages the existence of a set of time series obtained under the null hypothesis (a null training sample, NTS) and possibly complementary side information. The test statistic is computed from a standardized periodogram, which is a pointwise division of the periodogram of the series under test to an averaged periodogram obtained from the NTS. The procedure provides accurate p-values estimation through a dedicated Monte Carlo procedure. While the methodology is general, our application is here exoplanet detection. The proposed methods are benchmarked on astrophysical data.
我们提出了一种用于检测未知彩色噪声中的周期性和评估检验统计量的“显著性水平”(p值)的数值方法。该过程假设并利用了在零假设(零训练样本,NTS)和可能互补的侧信息下获得的一组时间序列的存在性。测试统计量是从一个标准化的周期图中计算出来的,它是将被测试序列的周期图逐点划分为从NTS得到的平均周期图。该程序通过专用的蒙特卡罗程序提供准确的p值估计。虽然方法是通用的,但我们的应用是系外行星探测。提出的方法以天体物理数据为基准。
{"title":"Semi-Supervised Standardized Detection of Periodic Signals with Application to Exoplanet Detection","authors":"S. Sulis, D. Mary, L. Bigot","doi":"10.1109/icassp43922.2022.9746081","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746081","url":null,"abstract":"We propose a numerical methodology for detecting periodicities in unknown colored noise and for evaluating the ‘significance levels’ (p-values) of the test statistics. The procedure assumes and leverages the existence of a set of time series obtained under the null hypothesis (a null training sample, NTS) and possibly complementary side information. The test statistic is computed from a standardized periodogram, which is a pointwise division of the periodogram of the series under test to an averaged periodogram obtained from the NTS. The procedure provides accurate p-values estimation through a dedicated Monte Carlo procedure. While the methodology is general, our application is here exoplanet detection. The proposed methods are benchmarked on astrophysical data.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"413 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122786234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AdderIC: Towards Low Computation Cost Image Compression AdderIC:迈向低计算成本的图像压缩
Bowen Li, Xin Yao, Chao Li, Youneng Bao, Fanyang Meng, Yongsheng Liang
Recently, learned image compression methods have shown their outstanding rate-distortion performance when compared to traditional frameworks. Although numerous progress has been made in learned image compression, the computation cost is still at a high level. To address this problem, we propose AdderIC, which utilizes adder neural networks (AdderNet) to construct an image compression framework. According to the characteristics of image compression, we introduce several strategies to improve the performance of AdderNet in this field. Specifically, Haar Wavelet Transform is adopted to make AdderIC learn high-frequency information efficiently. In addition, implicit deconvolution with the kernel size of 1 is applied after each adder layer to reduce spatial redundancies. Moreover, we develop a novel Adder-ID-PixelShuffle cascade upsampling structure to remove checkerboard artifacts. Experiments demonstrate that our AdderIC model can largely outperform conventional AdderNet when applied in image compression and achieve comparable rate-distortion performance to that of its CNN baseline with about 80% multiplication FLOPs and 30% energy consumption reduction.
近年来,与传统的图像压缩框架相比,学习得到的图像压缩方法表现出了出色的率失真性能。尽管在图像学习压缩方面取得了许多进展,但计算成本仍然很高。为了解决这个问题,我们提出了AdderIC,它利用加法器神经网络(AdderNet)来构建一个图像压缩框架。根据图像压缩的特点,介绍了几种提高AdderNet在该领域性能的策略。具体来说,采用Haar小波变换使AdderIC能够高效地学习高频信息。此外,在每个加法器层后进行核大小为1的隐式反卷积,以减少空间冗余。此外,我们开发了一种新的Adder-ID-PixelShuffle级联上采样结构来去除棋盘伪影。实验表明,我们的AdderIC模型在图像压缩中可以大大优于传统的AdderNet模型,并且可以实现与CNN基线相当的率失真性能,其乘法FLOPs约为80%,能耗降低30%。
{"title":"AdderIC: Towards Low Computation Cost Image Compression","authors":"Bowen Li, Xin Yao, Chao Li, Youneng Bao, Fanyang Meng, Yongsheng Liang","doi":"10.1109/icassp43922.2022.9747652","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747652","url":null,"abstract":"Recently, learned image compression methods have shown their outstanding rate-distortion performance when compared to traditional frameworks. Although numerous progress has been made in learned image compression, the computation cost is still at a high level. To address this problem, we propose AdderIC, which utilizes adder neural networks (AdderNet) to construct an image compression framework. According to the characteristics of image compression, we introduce several strategies to improve the performance of AdderNet in this field. Specifically, Haar Wavelet Transform is adopted to make AdderIC learn high-frequency information efficiently. In addition, implicit deconvolution with the kernel size of 1 is applied after each adder layer to reduce spatial redundancies. Moreover, we develop a novel Adder-ID-PixelShuffle cascade upsampling structure to remove checkerboard artifacts. Experiments demonstrate that our AdderIC model can largely outperform conventional AdderNet when applied in image compression and achieve comparable rate-distortion performance to that of its CNN baseline with about 80% multiplication FLOPs and 30% energy consumption reduction.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122825561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Prototype-Based Inter-Camera Learning for Person Re-Identification 基于原型的人再识别相机间学习
Lin Wang, Wanqian Zhang, Dayan Wu, Pingting Hong, Bo Li
Person re-identification (ReID) aims at retrieving images of the same person across non-overlapping camera views. The prior works focus on either fully supervised or unsupervised ReID settings, and achieve remarkable performances. In real scenarios, however, the major annotation cost comes from matching identity classes across camera views, thus leading to the Intra-Camera Supervised (ICS) ReID problem. In this work, we propose a Prototype-based Inter-camera ReID (PIRID) method, which tackles the ICS setting through the lens of prototype learning. Specifically, we first introduce the intra-camera learning with non-parametric classifiers to separately generate discriminative features within each camera view. Moreover, the inter-camera prototype learning provides prototypes as the representatives of each class in the common space, making the learned features to be camera-agnostic. Experiments conducted on three benchmarks, i.e., Market-1501, DukeMTMC-ReID, and MSMT17, show the superiority of our method.
人物再识别(ReID)的目的是在不重叠的相机视图中检索同一个人的图像。以往的工作主要集中在完全监督或无监督的ReID设置上,并取得了显著的成绩。然而,在实际场景中,主要的注释成本来自于跨摄像机视图匹配身份类,从而导致摄像机内监督(ICS) ReID问题。在这项工作中,我们提出了一种基于原型的相机间ReID (PIRID)方法,该方法通过原型学习的视角来解决ICS设置。具体来说,我们首先引入非参数分类器的相机内学习,在每个相机视图中分别生成判别特征。此外,相机间原型学习提供了作为公共空间中每个类的代表的原型,使得学习到的特征与相机无关。在Market-1501、DukeMTMC-ReID和MSMT17三个基准上进行的实验表明了我们方法的优越性。
{"title":"Prototype-Based Inter-Camera Learning for Person Re-Identification","authors":"Lin Wang, Wanqian Zhang, Dayan Wu, Pingting Hong, Bo Li","doi":"10.1109/icassp43922.2022.9746640","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746640","url":null,"abstract":"Person re-identification (ReID) aims at retrieving images of the same person across non-overlapping camera views. The prior works focus on either fully supervised or unsupervised ReID settings, and achieve remarkable performances. In real scenarios, however, the major annotation cost comes from matching identity classes across camera views, thus leading to the Intra-Camera Supervised (ICS) ReID problem. In this work, we propose a Prototype-based Inter-camera ReID (PIRID) method, which tackles the ICS setting through the lens of prototype learning. Specifically, we first introduce the intra-camera learning with non-parametric classifiers to separately generate discriminative features within each camera view. Moreover, the inter-camera prototype learning provides prototypes as the representatives of each class in the common space, making the learned features to be camera-agnostic. Experiments conducted on three benchmarks, i.e., Market-1501, DukeMTMC-ReID, and MSMT17, show the superiority of our method.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122888944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Recovery of Noisy Pooled Tests via Learned Factor Graphs with Application to COVID-19 Testing 基于学习因子图的噪声池测试恢复及其在COVID-19测试中的应用
Eyal Fishel Ben-Knaan, Yonina C. Eldar, Nir Shlezinger
The ongoing pandemic and the necessity of frequent testing have spurred a growing interest in pooled testing. Conventional recovery methods from pooled tests are based on group testing or compressed sensing tools which rely on simplistic modeling of the pooling process, and may not be reliable in the presence of complex and noisy measurement procedures and highly infected populations. In this work, we propose a strategy for pooled testing designed for noisy settings, which bypasses the need for a tractable acquisition model. This is achieved by combining deep learning, for implicitly learning the measurement relationship from data, with factor graph inference, which exploits the structured known pooling pattern. Learned factor graphs provide a quantitative readout corresponding to the infection severity, as opposed to group testing which only detects the presence of infection. The proposed scheme is shown to achieve improved robustness to noise compared with previous approaches and to reliably estimate in highly infected populations.
持续的大流行和频繁检测的必要性促使人们对集中检测越来越感兴趣。混合测试的常规恢复方法基于群体测试或压缩传感工具,这些工具依赖于混合过程的简单建模,在复杂和嘈杂的测量程序和高度感染的人群存在时可能不可靠。在这项工作中,我们提出了一种针对噪声设置的池化测试策略,该策略绕过了对可处理采集模型的需求。这是通过将深度学习(用于隐式地从数据中学习测量关系)与因子图推理(利用结构化已知池化模式)相结合来实现的。学习因子图提供了与感染严重程度相对应的定量读数,而不是仅检测感染存在的组测试。与以前的方法相比,所提出的方案具有更好的噪声鲁棒性,并且在高感染人群中可靠地进行估计。
{"title":"Recovery of Noisy Pooled Tests via Learned Factor Graphs with Application to COVID-19 Testing","authors":"Eyal Fishel Ben-Knaan, Yonina C. Eldar, Nir Shlezinger","doi":"10.1109/icassp43922.2022.9747150","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747150","url":null,"abstract":"The ongoing pandemic and the necessity of frequent testing have spurred a growing interest in pooled testing. Conventional recovery methods from pooled tests are based on group testing or compressed sensing tools which rely on simplistic modeling of the pooling process, and may not be reliable in the presence of complex and noisy measurement procedures and highly infected populations. In this work, we propose a strategy for pooled testing designed for noisy settings, which bypasses the need for a tractable acquisition model. This is achieved by combining deep learning, for implicitly learning the measurement relationship from data, with factor graph inference, which exploits the structured known pooling pattern. Learned factor graphs provide a quantitative readout corresponding to the infection severity, as opposed to group testing which only detects the presence of infection. The proposed scheme is shown to achieve improved robustness to noise compared with previous approaches and to reliably estimate in highly infected populations.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123033888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Partially Relaxed Orthogonal Least Squares Weighted Subspace Fitting Direction-of-Arrival Estimation 部分松弛正交最小二乘加权子空间拟合到达方向估计
David Schenck, Katja Lübbe, Minh Trinh-Hoang, M. Pesavento
The Partial Relaxation framework has recently been introduced to address the Direction-of-Arrival (DOA) estimation problem [1]–[3]. DOA estimators under the Partial Relaxation (PR) framework are computationally efficient while preserving excellent DOA estimation accuracy. This is achieved by keeping the structure of the signal from the desired direction unchanged while relaxing the structure of the signals from the remaining undesired directions. This type of relaxation allows to compute closed-form estimates for the undesired signal part and improves the accuracy of the DOA estimates compared to conventional spectral-search methods like, e.g. MUSIC. Following a similar approach as in [4] the PR framework is combined with the Orthogonal Least Squares (OLS) technique of [5]. A novel DOA estimator is proposed that is based on Partially-Relaxed Weighted Subspace Fitting (PR-WSF) in which the DOAs are iteratively estimated. Thereby, one DOA is estimated per iteration, while accounting for both the signal contributions under the previously-determined DOAs, with full signal structure, as well as the remaining DOAs with relaxed structure. Moreover, an efficient implementation of the Partially-Relaxed Orthogonal Least Squares Weighted Subspace Fitting (PR-OLS-WSF) method is proposed that provides similar computational cost as the MUSIC algorithm. Simulation results show that the proposed PR-OLS-WSF estimator provides excellent performance especially in difficult scenarios with low Signal-to-Noise-Ratio (SNR) and closely spaced sources.
最近引入了部分松弛框架来解决到达方向(DOA)估计问题[1]-[3]。在部分松弛(PR)框架下的DOA估计器在保持良好的DOA估计精度的同时,计算效率很高。这是通过保持来自期望方向的信号结构不变,同时放松来自剩余不希望的方向的信号结构来实现的。这种类型的松弛允许计算不需要的信号部分的封闭形式估计,并且与传统的频谱搜索方法(例如MUSIC)相比,提高了DOA估计的准确性。遵循与[4]类似的方法,PR框架与[5]的正交最小二乘(OLS)技术相结合。提出了一种新的基于部分松弛加权子空间拟合(PR-WSF)的DOA估计方法,迭代估计DOA。因此,每次迭代估计一个DOA,同时考虑在先前确定的具有完整信号结构的DOA下的信号贡献,以及具有宽松结构的剩余DOA。此外,提出了一种有效的部分松弛正交最小二乘加权子空间拟合(PR-OLS-WSF)方法,其计算成本与MUSIC算法相似。仿真结果表明,所提出的PR-OLS-WSF估计器在低信噪比和信噪比较近的困难场景下具有良好的性能。
{"title":"Partially Relaxed Orthogonal Least Squares Weighted Subspace Fitting Direction-of-Arrival Estimation","authors":"David Schenck, Katja Lübbe, Minh Trinh-Hoang, M. Pesavento","doi":"10.1109/icassp43922.2022.9747309","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747309","url":null,"abstract":"The Partial Relaxation framework has recently been introduced to address the Direction-of-Arrival (DOA) estimation problem [1]–[3]. DOA estimators under the Partial Relaxation (PR) framework are computationally efficient while preserving excellent DOA estimation accuracy. This is achieved by keeping the structure of the signal from the desired direction unchanged while relaxing the structure of the signals from the remaining undesired directions. This type of relaxation allows to compute closed-form estimates for the undesired signal part and improves the accuracy of the DOA estimates compared to conventional spectral-search methods like, e.g. MUSIC. Following a similar approach as in [4] the PR framework is combined with the Orthogonal Least Squares (OLS) technique of [5]. A novel DOA estimator is proposed that is based on Partially-Relaxed Weighted Subspace Fitting (PR-WSF) in which the DOAs are iteratively estimated. Thereby, one DOA is estimated per iteration, while accounting for both the signal contributions under the previously-determined DOAs, with full signal structure, as well as the remaining DOAs with relaxed structure. Moreover, an efficient implementation of the Partially-Relaxed Orthogonal Least Squares Weighted Subspace Fitting (PR-OLS-WSF) method is proposed that provides similar computational cost as the MUSIC algorithm. Simulation results show that the proposed PR-OLS-WSF estimator provides excellent performance especially in difficult scenarios with low Signal-to-Noise-Ratio (SNR) and closely spaced sources.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122092417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Prosodyspeech: Towards Advanced Prosody Model for Neural Text-to-Speech 韵律语音:神经文本到语音的高级韵律模型
Yuanhao Yi, Lei He, Shifeng Pan, Xi Wang, Yujia Xiao
This paper proposes ProsodySpeech, a novel prosody model to enhance encoder-decoder neural Text-To-Speech (TTS), to generate high expressive and personalized speech even with very limited training data. First, we use a Prosody Extractor built from a large speech corpus with various speakers to generate a set of prosody exemplars from multiple reference speeches, in which Mutual Information based Style content separation (MIST) is adopted to alleviate "content leakage" problem. Second, we use a Prosody Distributor to make a soft selection of appropriate prosody exemplars in phone-level with the help of an attention mechanism. The resulting prosody feature is then aggregated into the output of text encoder, together with additional phone-level pitch feature to enrich the prosody. We apply this method into two tasks: highly expressive multi style/emotion TTS and few-shot personalized TTS. The experiments show the proposed model outperforms baseline FastSpeech 2 + GST with significant improvements in terms of similarity and style expression.
本文提出了一种新的韵律模型ProsodySpeech,用于增强编码器-解码器神经文本到语音(TTS),即使在非常有限的训练数据下也能生成高表现力和个性化的语音。首先,我们使用基于不同说话人的大型语料库构建的韵律提取器,从多个参考演讲中生成韵律样例,其中采用基于互信息的风格内容分离(MIST)来缓解“内容泄漏”问题。其次,在注意机制的帮助下,我们使用韵律分发器对语音层面的韵律范例进行软选择。然后将得到的韵律特征聚合到文本编码器的输出中,并与额外的电话级音高特征一起丰富韵律。我们将该方法应用于两个任务:高表现力的多风格/情感TTS和少镜头个性化TTS。实验表明,该模型在相似度和风格表达方面有显著改善,优于基线FastSpeech 2 + GST。
{"title":"Prosodyspeech: Towards Advanced Prosody Model for Neural Text-to-Speech","authors":"Yuanhao Yi, Lei He, Shifeng Pan, Xi Wang, Yujia Xiao","doi":"10.1109/ICASSP43922.2022.9746744","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9746744","url":null,"abstract":"This paper proposes ProsodySpeech, a novel prosody model to enhance encoder-decoder neural Text-To-Speech (TTS), to generate high expressive and personalized speech even with very limited training data. First, we use a Prosody Extractor built from a large speech corpus with various speakers to generate a set of prosody exemplars from multiple reference speeches, in which Mutual Information based Style content separation (MIST) is adopted to alleviate \"content leakage\" problem. Second, we use a Prosody Distributor to make a soft selection of appropriate prosody exemplars in phone-level with the help of an attention mechanism. The resulting prosody feature is then aggregated into the output of text encoder, together with additional phone-level pitch feature to enrich the prosody. We apply this method into two tasks: highly expressive multi style/emotion TTS and few-shot personalized TTS. The experiments show the proposed model outperforms baseline FastSpeech 2 + GST with significant improvements in terms of similarity and style expression.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116851434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Improving Joint Sparse Hyperspectral Unmixing by Simultaneously Clustering Pixels According To Their Mixtures 基于混合像素同时聚类改进联合稀疏高光谱解混
S. F. Seyyedsalehi, H. Rabiee
In this paper we propose a novel hierarchical Bayesian model for sparse regression problem to use in semi-supervised hyperspectral unmixing which assumes the signal recorded in each hyperspectral pixel is a linear combination of members of the spectral library contaminated by an additive Gaussian noise. To effectively utilizing the spatial correlation between neighboring pixels during the unmixing process, we exploit a Markov random field to simultaneously group pixels to clusters which are associated to regions with homogeneous mixtures in a natural scene. We assume Sparse fractional abundances of members of a cluster to be generated from an exponential distribution with the same rate parameter. We show that our method is able to detect unconnected regions which have similar mixtures. Experiments on synthetic and real hyperspectral images confirm the superiority of the proposed method compared to alternatives.
本文提出了一种用于半监督高光谱分解的稀疏回归问题的新的层次贝叶斯模型,该模型假设记录在每个高光谱像素中的信号是受加性高斯噪声污染的光谱库成员的线性组合。为了在解混过程中有效地利用相邻像素之间的空间相关性,我们利用马尔科夫随机场将像素同时分组到与自然场景中均匀混合区域相关的簇中。我们假设集群成员的稀疏分数丰度是由具有相同速率参数的指数分布产生的。我们证明了我们的方法能够检测出具有相似混合物的不连接区域。在合成和真实高光谱图像上的实验证实了该方法的优越性。
{"title":"Improving Joint Sparse Hyperspectral Unmixing by Simultaneously Clustering Pixels According To Their Mixtures","authors":"S. F. Seyyedsalehi, H. Rabiee","doi":"10.1109/icassp43922.2022.9746552","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746552","url":null,"abstract":"In this paper we propose a novel hierarchical Bayesian model for sparse regression problem to use in semi-supervised hyperspectral unmixing which assumes the signal recorded in each hyperspectral pixel is a linear combination of members of the spectral library contaminated by an additive Gaussian noise. To effectively utilizing the spatial correlation between neighboring pixels during the unmixing process, we exploit a Markov random field to simultaneously group pixels to clusters which are associated to regions with homogeneous mixtures in a natural scene. We assume Sparse fractional abundances of members of a cluster to be generated from an exponential distribution with the same rate parameter. We show that our method is able to detect unconnected regions which have similar mixtures. Experiments on synthetic and real hyperspectral images confirm the superiority of the proposed method compared to alternatives.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117298401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1