首页 > 最新文献

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Audio Peak Reduction Using a Synced allpass Filter 使用同步全通滤波器的音频峰值降低
Sebastian J. Schlecht, Leonardo Fierro, V. Välimäki, J. Backman
Peak reduction is a common step used in audio playback chains to increase the loudness of a sound. The distortion introduced by a conventional nonlinear compressor can be avoided with the use of an allpass filter, which provides peak reduction by acting on the signal phase. This way, the signal energy around a waveform peak can be smeared while maintaining the total energy of the signal. In this paper, a new technique for linear peak amplitude reduction is proposed based on a Schroeder allpass filter, whose delay line and gain parameters are synced to match peaks of the signal’s auto-correlation function. The proposed method is compared with a previous search method and is shown to be often superior. An evaluation conducted over a variety of test signals indicates that the achieved peak reduction spans from 0 to 5 dB depending on the input waveform. The proposed method is widely applicable to real-time sound reproduction with a minimal computational processing budget.
峰值降低是音频播放链中常用的步骤,用于增加声音的响度。使用全通滤波器可以避免传统非线性压缩器带来的失真,全通滤波器通过作用于信号相位来降低峰值。这样,在保持信号总能量的同时,波形峰值周围的信号能量可以被抹掉。本文提出了一种基于施罗德全通滤波器的线性峰值降幅新技术,该滤波器的延迟线和增益参数同步以匹配信号自相关函数的峰值。将该方法与先前的搜索方法进行了比较,结果表明该方法通常具有优越性。对各种测试信号进行的评估表明,根据输入波形的不同,实现的峰值降低范围从0到5db。该方法广泛适用于以最小的计算处理预算进行实时声音再现。
{"title":"Audio Peak Reduction Using a Synced allpass Filter","authors":"Sebastian J. Schlecht, Leonardo Fierro, V. Välimäki, J. Backman","doi":"10.1109/icassp43922.2022.9747877","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747877","url":null,"abstract":"Peak reduction is a common step used in audio playback chains to increase the loudness of a sound. The distortion introduced by a conventional nonlinear compressor can be avoided with the use of an allpass filter, which provides peak reduction by acting on the signal phase. This way, the signal energy around a waveform peak can be smeared while maintaining the total energy of the signal. In this paper, a new technique for linear peak amplitude reduction is proposed based on a Schroeder allpass filter, whose delay line and gain parameters are synced to match peaks of the signal’s auto-correlation function. The proposed method is compared with a previous search method and is shown to be often superior. An evaluation conducted over a variety of test signals indicates that the achieved peak reduction spans from 0 to 5 dB depending on the input waveform. The proposed method is widely applicable to real-time sound reproduction with a minimal computational processing budget.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126113385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-Target Stance Detection Via Refined Meta-Learning 基于改进元学习的交叉目标姿态检测
Huishan Ji, Zheng Lin, Peng Fu, Weiping Wang
Cross-target stance detection (CTSD) aims to identify the stance of the text towards a target, where stance annotations are available for (though related but) different targets. Recently, models based on external semantic and emotion knowledge have been proposed for CTSD, achieving promising performance. However, such solutions rely on much external resources and harness only one source target, which is a waste of other available targets. To address the problem above, we propose a many-to-one CTSD model based on meta-learning. To make the most of meta-learning, we further refine it with a balanced and easy-to-hard learning pattern. Specifically, for multiple-target training, we feed the model according to the similarity among targets, and utilize two kinds of re-balanced strategies to deal with the imbalance in data. We conduct experiments on SemEval 2016 task 6, and results demonstrate that our method is effective and establishes a new state-of-the-art macro-f1 score for CTSD.
跨目标立场检测(CTSD)旨在识别文本对目标的立场,其中立场注释可用于(尽管相关但)不同的目标。近年来,人们提出了基于外部语义和情感知识的CTSD模型,并取得了良好的效果。然而,这样的解决方案依赖于很多外部资源,并且只利用一个源目标,这是对其他可用目标的浪费。为了解决上述问题,我们提出了一种基于元学习的多对一CTSD模型。为了充分发挥元学习的作用,我们使用一种平衡的、简单难学的学习模式来进一步完善它。具体来说,对于多目标训练,我们根据目标之间的相似度来馈送模型,并利用两种重新平衡策略来处理数据的不平衡。我们在SemEval 2016 task 6上进行了实验,结果表明我们的方法是有效的,并为CTSD建立了一个新的最先进的宏观f1分数。
{"title":"Cross-Target Stance Detection Via Refined Meta-Learning","authors":"Huishan Ji, Zheng Lin, Peng Fu, Weiping Wang","doi":"10.1109/icassp43922.2022.9746302","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746302","url":null,"abstract":"Cross-target stance detection (CTSD) aims to identify the stance of the text towards a target, where stance annotations are available for (though related but) different targets. Recently, models based on external semantic and emotion knowledge have been proposed for CTSD, achieving promising performance. However, such solutions rely on much external resources and harness only one source target, which is a waste of other available targets. To address the problem above, we propose a many-to-one CTSD model based on meta-learning. To make the most of meta-learning, we further refine it with a balanced and easy-to-hard learning pattern. Specifically, for multiple-target training, we feed the model according to the similarity among targets, and utilize two kinds of re-balanced strategies to deal with the imbalance in data. We conduct experiments on SemEval 2016 task 6, and results demonstrate that our method is effective and establishes a new state-of-the-art macro-f1 score for CTSD.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123442291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Novel Unsupervised Autoencoder-Based HFOs Detector in Intracranial EEG Signals 一种新的基于无监督自编码器的颅内脑电信号HFOs检测器
Weilai Li, Lanfeng Zhong, Weixi Xiang, Tongzhou Kang, Dakun Lai
High frequency oscillations (HFOs) have demonstrated their potency acting as an effective biomarker in epilepsy. However, most of the existing HFOs detectors are based on manual feature extraction and supervised learning, which incur laborious feature selection and time-consuming labeling process. In order to tackle these issues, we propose an automatic unsupervised HFOs detector based on convolutional variational autoencoder (CVAE). First, each selected HFO candidate (via an initial detection method) is converted into a 2-D time-frequency map (TFM) using continuous wavelet transform (CWT). Then, CVAE is trained on the red channel of the TFM (R-TFM) dataset so as to achieve the goal of dimensionality reduction and reconstruction of input feature. The reconstructed R-TFM dataset is later classified by K-means algorithm. Experimental results show that the proposed method outperforms four existing detectors, and achieve 92.85% in accuracy, 93.91% in sensitivity, and 92.14% in specificity.
高频振荡(hfo)已经证明了它们作为癫痫有效生物标志物的效力。然而,现有的HFOs检测器大多是基于人工特征提取和监督学习,这导致了费力的特征选择和耗时的标记过程。为了解决这些问题,我们提出了一种基于卷积变分自编码器(CVAE)的自动无监督hfo检测器。首先,使用连续小波变换(CWT)将每个选定的候选HFO(通过初始检测方法)转换为二维时频图(TFM)。然后,在TFM (R-TFM)数据集的红色通道上训练CVAE,以达到降维重建输入特征的目的。重构后的R-TFM数据集采用K-means算法进行分类。实验结果表明,该方法优于现有的4种检测器,准确率为92.85%,灵敏度为93.91%,特异性为92.14%。
{"title":"A Novel Unsupervised Autoencoder-Based HFOs Detector in Intracranial EEG Signals","authors":"Weilai Li, Lanfeng Zhong, Weixi Xiang, Tongzhou Kang, Dakun Lai","doi":"10.1109/icassp43922.2022.9746014","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746014","url":null,"abstract":"High frequency oscillations (HFOs) have demonstrated their potency acting as an effective biomarker in epilepsy. However, most of the existing HFOs detectors are based on manual feature extraction and supervised learning, which incur laborious feature selection and time-consuming labeling process. In order to tackle these issues, we propose an automatic unsupervised HFOs detector based on convolutional variational autoencoder (CVAE). First, each selected HFO candidate (via an initial detection method) is converted into a 2-D time-frequency map (TFM) using continuous wavelet transform (CWT). Then, CVAE is trained on the red channel of the TFM (R-TFM) dataset so as to achieve the goal of dimensionality reduction and reconstruction of input feature. The reconstructed R-TFM dataset is later classified by K-means algorithm. Experimental results show that the proposed method outperforms four existing detectors, and achieve 92.85% in accuracy, 93.91% in sensitivity, and 92.14% in specificity.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123573126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self Supervised Representation Learning with Deep Clustering for Acoustic Unit Discovery from Raw Speech 基于深度聚类的自监督表示学习在原始语音声学单元发现中的应用
Varun Krishna, Sriram Ganapathy
The automatic discovery of acoustic sub-word units from raw speech, without any text or labels, is a growing field of research. The key challenge is to derive representations of speech that can be categorized into a small number of phoneme-like units which are speaker invariant and can broadly capture the content variability of speech. In this work, we propose a novel neural network paradigm that uses the deep clustering loss along with the autoregressive contrastive predictive coding (CPC) loss. Both the loss functions, the CPC and the clustering loss, are self-supervised. The clustering cost involves the loss function using the phoneme-like labels generated with an iterative k-means algorithm. The inclusion of this loss ensures that the model representations can be categorized into a small number of automatic speech units. We experiment with several sub-tasks described as part of the Zerospeech 2021 challenge to illustrate the effectiveness of the framework. In these experiments, we show that proposed representation learning approach improves significantly over the previous self-supervision based models as well as the wav2vec family of models on a range of word-level similarity tasks and language modeling tasks.
从原始语音中自动发现声学子词单位,不需要任何文本或标签,是一个不断发展的研究领域。关键的挑战是推导出语音的表示,这些表示可以被分类为少数类似音素的单位,这些单位是说话者不变的,并且可以广泛地捕捉语音的内容可变性。在这项工作中,我们提出了一种新的神经网络范式,该范式使用深度聚类损失和自回归对比预测编码(CPC)损失。损失函数CPC和聚类损失都是自监督的。聚类成本包括使用迭代k-means算法生成的类音素标签的损失函数。这种损失的包含确保了模型表示可以被分类为少量的自动语音单元。我们对Zerospeech 2021挑战中的几个子任务进行了实验,以说明该框架的有效性。在这些实验中,我们表明,所提出的表征学习方法在一系列词级相似性任务和语言建模任务上,明显优于先前基于自我监督的模型以及wav2vec系列模型。
{"title":"Self Supervised Representation Learning with Deep Clustering for Acoustic Unit Discovery from Raw Speech","authors":"Varun Krishna, Sriram Ganapathy","doi":"10.1109/icassp43922.2022.9747259","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747259","url":null,"abstract":"The automatic discovery of acoustic sub-word units from raw speech, without any text or labels, is a growing field of research. The key challenge is to derive representations of speech that can be categorized into a small number of phoneme-like units which are speaker invariant and can broadly capture the content variability of speech. In this work, we propose a novel neural network paradigm that uses the deep clustering loss along with the autoregressive contrastive predictive coding (CPC) loss. Both the loss functions, the CPC and the clustering loss, are self-supervised. The clustering cost involves the loss function using the phoneme-like labels generated with an iterative k-means algorithm. The inclusion of this loss ensures that the model representations can be categorized into a small number of automatic speech units. We experiment with several sub-tasks described as part of the Zerospeech 2021 challenge to illustrate the effectiveness of the framework. In these experiments, we show that proposed representation learning approach improves significantly over the previous self-supervision based models as well as the wav2vec family of models on a range of word-level similarity tasks and language modeling tasks.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123768822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Text Adaptive Detection for Customizable Keyword Spotting 文本自适应检测自定义关键字发现
Yu Xi, Tian Tan, Wangyou Zhang, Baochen Yang, Kai Yu
Always-on keyword spotting (KWS), i.e., wake word detection, has been widely used in many voice assistant applications running on smart devices. Although fixed wakeup word detection trained on specifically collected data has reached high performance, it is still challenging to build an arbitrarily customizable detection system on general found data. A deep learning classifier, similar to the one in speech recognition, can be used, but the detection performance is usually significantly degraded. In this work, we propose a novel text adaptive detection framework to directly formulate KWS as a detection rather than a classification problem. Here, the text prompt is used as input to promote biased classification, and a series of frame and sequence level detection criteria are employed to replace the cross-entropy criterion and directly optimize detection performance. Experiments on a keyword spotting version of Wall Street Journal (WSJ) dataset show that the text adaptive detection framework can achieve an average relative improvement of 16.88% in the detection metric F1-score compared to the baseline model.
始终在线关键字识别(KWS),即唤醒词检测,已广泛应用于许多智能设备上的语音助手应用。虽然在特定收集的数据上训练的固定唤醒词检测已经达到了很高的性能,但是在一般发现的数据上建立一个任意定制的检测系统仍然是一个挑战。可以使用深度学习分类器,类似于语音识别中的分类器,但检测性能通常会显著下降。在这项工作中,我们提出了一种新的文本自适应检测框架,将KWS直接表述为检测问题而不是分类问题。本文以文本提示作为输入,促进有偏差分类,并采用一系列帧级和序列级检测准则代替交叉熵准则,直接优化检测性能。在《华尔街日报》关键字识别版数据集上的实验表明,与基线模型相比,文本自适应检测框架在检测指标f1得分上的平均相对提高了16.88%。
{"title":"Text Adaptive Detection for Customizable Keyword Spotting","authors":"Yu Xi, Tian Tan, Wangyou Zhang, Baochen Yang, Kai Yu","doi":"10.1109/icassp43922.2022.9746647","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746647","url":null,"abstract":"Always-on keyword spotting (KWS), i.e., wake word detection, has been widely used in many voice assistant applications running on smart devices. Although fixed wakeup word detection trained on specifically collected data has reached high performance, it is still challenging to build an arbitrarily customizable detection system on general found data. A deep learning classifier, similar to the one in speech recognition, can be used, but the detection performance is usually significantly degraded. In this work, we propose a novel text adaptive detection framework to directly formulate KWS as a detection rather than a classification problem. Here, the text prompt is used as input to promote biased classification, and a series of frame and sequence level detection criteria are employed to replace the cross-entropy criterion and directly optimize detection performance. Experiments on a keyword spotting version of Wall Street Journal (WSJ) dataset show that the text adaptive detection framework can achieve an average relative improvement of 16.88% in the detection metric F1-score compared to the baseline model.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125331879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Pyramid Fusion Attention Network For Single Image Super-Resolution 单幅图像超分辨率金字塔融合注意网络
Hao He, Zongcai Du, Wenfeng Li, Jie Tang, Gangshan Wu
Recently, convolutional neural network (CNN) has made a mighty advance in image super-resolution (SR). Most recent models exploit attention mechanism (AM) to focus on high-frequency information. However, these methods exclusively consider interdependencies among channels or spatials, leading to equal treatment of channel-wise or spatial-wise features thus hindering the power of AM. In this paper, we propose a pyramid fusion attention network (PFAN) to tackle this problem. Specifically, a novel pyramid fusion attention (PFA) is developed where stacked residual blocks are employed to model the relationship between pixels among all channels, and pyramid fusion structure is adopted to expand receptive field. Besides, a progressive backward fusion strat-egy is introduced to make full use of hierarchical features, which are beneficial to obtaining more contextual representations. Comprehensive experiments demonstrate the superiority of our proposed PFAN against state-of-the-art methods.
近年来,卷积神经网络(CNN)在图像超分辨率(SR)方面取得了长足的进步。最近的模型利用注意机制(AM)来关注高频信息。然而,这些方法只考虑通道或空间之间的相互依赖性,导致对通道或空间特征的平等对待,从而阻碍了AM的力量。在本文中,我们提出了一个金字塔融合注意力网络(PFAN)来解决这个问题。具体而言,提出了一种新的金字塔融合注意(PFA)方法,利用堆叠残差块来模拟各通道之间像素之间的关系,并采用金字塔融合结构来扩展接受域。此外,引入了一种递进后向融合策略,充分利用层次特征,有利于获得更多的上下文表示。综合实验证明了我们提出的PFAN相对于最先进的方法的优越性。
{"title":"Pyramid Fusion Attention Network For Single Image Super-Resolution","authors":"Hao He, Zongcai Du, Wenfeng Li, Jie Tang, Gangshan Wu","doi":"10.1109/icassp43922.2022.9747609","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747609","url":null,"abstract":"Recently, convolutional neural network (CNN) has made a mighty advance in image super-resolution (SR). Most recent models exploit attention mechanism (AM) to focus on high-frequency information. However, these methods exclusively consider interdependencies among channels or spatials, leading to equal treatment of channel-wise or spatial-wise features thus hindering the power of AM. In this paper, we propose a pyramid fusion attention network (PFAN) to tackle this problem. Specifically, a novel pyramid fusion attention (PFA) is developed where stacked residual blocks are employed to model the relationship between pixels among all channels, and pyramid fusion structure is adopted to expand receptive field. Besides, a progressive backward fusion strat-egy is introduced to make full use of hierarchical features, which are beneficial to obtaining more contextual representations. Comprehensive experiments demonstrate the superiority of our proposed PFAN against state-of-the-art methods.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125398352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multi-Scale Refinement Network Based Acoustic Echo Cancellation 基于多尺度细化网络的声回波抵消
Fan Cui, Liyong Guo, Wenfeng Li, Peng Gao, Yujun Wang
Recently, deep encoder-decoder networks have shown outstanding performance in acoustic echo cancellation (AEC). However, the subsampling operations like convolution striding in the encoder layers significantly decrease the feature resolution lead to fine-grained information loss. This paper proposes an encoder-decoder network for acoustic echo cancellation with mutli-scale refinement paths to exploit the information at different feature scales. In the encoder stage, high-level features are obtained to get a coarse result. Then, the decoder layers with multiple refinement paths can directly refine the result with fine-grained features. Refinement paths with different feature scales are combined by learnable weights. The experimental results show that using the proposed multi-scale refinement structure can significantly improve the objective criteria. In the ICASSP 2022 Acoustic echo cancellation Challenge, our submitted system achieves an overall MOS score of 4.439 with 4.37 million parameters at a system latency of 40ms.
近年来,深度编码器-解码器网络在声回波消除(AEC)方面表现出了优异的性能。然而,编码器层中的卷积步进等子采样操作显著降低了特征分辨率,导致细粒度信息丢失。本文提出了一种基于多尺度细化路径的声回波抵消编码器-解码器网络,以利用不同特征尺度的信息。在编码器阶段,获取高级特征,得到粗糙的结果。然后,具有多个细化路径的解码器层可以直接对结果进行细粒度特征的细化。不同特征尺度的细化路径通过可学习权值组合。实验结果表明,采用所提出的多尺度细化结构可以显著提高客观标准。在ICASSP 2022声学回波消除挑战赛中,我们提交的系统在系统延迟为40ms的情况下,在437万个参数下获得了4439分的总体MOS分数。
{"title":"Multi-Scale Refinement Network Based Acoustic Echo Cancellation","authors":"Fan Cui, Liyong Guo, Wenfeng Li, Peng Gao, Yujun Wang","doi":"10.1109/ICASSP43922.2022.9747891","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9747891","url":null,"abstract":"Recently, deep encoder-decoder networks have shown outstanding performance in acoustic echo cancellation (AEC). However, the subsampling operations like convolution striding in the encoder layers significantly decrease the feature resolution lead to fine-grained information loss. This paper proposes an encoder-decoder network for acoustic echo cancellation with mutli-scale refinement paths to exploit the information at different feature scales. In the encoder stage, high-level features are obtained to get a coarse result. Then, the decoder layers with multiple refinement paths can directly refine the result with fine-grained features. Refinement paths with different feature scales are combined by learnable weights. The experimental results show that using the proposed multi-scale refinement structure can significantly improve the objective criteria. In the ICASSP 2022 Acoustic echo cancellation Challenge, our submitted system achieves an overall MOS score of 4.439 with 4.37 million parameters at a system latency of 40ms.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125502959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Variable Span Trade-Off Filter for Sound Zone Control with Kernel Interpolation Weighting 带核插值加权的声音区域控制变跨度权衡滤波器
Jesper Brunnström, Shoichi Koyama, Marc Moonen
A sound zone control method is proposed, based on the frequency domain variable span trade-off filter (VAST). Existing VAST methods optimizes the sound field at a set of discrete points, while the proposed method uses kernel interpolation to instead optimize the sound field over a continuous region. When the loudspeaker positions are known, the performance can be improved further by applying a directional weighting to the interpolation procedure. The proposed method is evaluated by simulating broadband sound in a reverberant environment, focusing on the case when microphone placement is restricted. The proposed method with directional weighting outperforms the pointwise VAST over the full bandwidth of the signal, and the proposed method without directional weighting outperforms the pointwise VAST at low frequencies.
提出了一种基于频域变跨度权衡滤波器(VAST)的声区控制方法。现有的VAST方法是在一组离散点上优化声场,而本文的方法是在连续区域上使用核插值来优化声场。当扬声器位置已知时,可以通过对插值程序应用方向加权来进一步提高性能。通过在混响环境中模拟宽带声对该方法进行了评估,重点讨论了麦克风放置受限的情况。在信号的全带宽范围内,带方向加权的方法优于点向VAST,而不带方向加权的方法在低频范围内优于点向VAST。
{"title":"Variable Span Trade-Off Filter for Sound Zone Control with Kernel Interpolation Weighting","authors":"Jesper Brunnström, Shoichi Koyama, Marc Moonen","doi":"10.1109/ICASSP43922.2022.9746550","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9746550","url":null,"abstract":"A sound zone control method is proposed, based on the frequency domain variable span trade-off filter (VAST). Existing VAST methods optimizes the sound field at a set of discrete points, while the proposed method uses kernel interpolation to instead optimize the sound field over a continuous region. When the loudspeaker positions are known, the performance can be improved further by applying a directional weighting to the interpolation procedure. The proposed method is evaluated by simulating broadband sound in a reverberant environment, focusing on the case when microphone placement is restricted. The proposed method with directional weighting outperforms the pointwise VAST over the full bandwidth of the signal, and the proposed method without directional weighting outperforms the pointwise VAST at low frequencies.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126640540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Self-Learned Video Super-Resolution with Augmented Spatial and Temporal Context 具有增强空间和时间背景的自学视频超分辨率
Zejia Fan, Jiaying Liu, Wenhan Yang, Wei Xiang, Zongming Guo
Video super-resolution methods typically rely on paired training data, in which the low-resolution frames are usually synthetically generated under predetermined degradation conditions (e.g., Bicubic downsampling). However, in real applications, it is labor-consuming and expensive to obtain this kind of training data, which limits the practical performance of these methods. To address the issue and get rid of the synthetic paired data, in this paper, we make exploration in utilizing the internal self-similarity redundancy within the video to build a Self-Learned Video Super-Resolution (SLVSR) method, which only needs to be trained on the input testing video itself. We employ a series of data augmentation strategies to make full use of the spatial and temporal context of the target video clips. The idea is applied to two branches of mainstream SR methods: frame fusion and frame recurrence methods. Since the former takes advantage of the short-term temporal consistency and the latter of the long-term one, our method can satisfy different practical situations. The experimental results show the superiority of our proposed method, especially in addressing the video super-resolution problems in real applications.
视频超分辨率方法通常依赖于配对训练数据,其中低分辨率帧通常是在预定的退化条件下合成的(例如,双三次降采样)。然而,在实际应用中,这类训练数据的获取非常耗费人力和成本,限制了这些方法的实际性能。为了解决这一问题,摆脱合成的配对数据,本文探索利用视频内部的自相似冗余,构建一种只需要对输入测试视频本身进行训练的自学习视频超分辨率(SLVSR)方法。我们采用了一系列的数据增强策略来充分利用目标视频片段的时空背景。将该思想应用于主流SR方法的两个分支:框架融合和框架递归方法。由于前者利用了短期时间一致性,后者利用了长期时间一致性,因此我们的方法可以满足不同的实际情况。实验结果表明了该方法的优越性,特别是在解决实际应用中的视频超分辨率问题方面。
{"title":"Self-Learned Video Super-Resolution with Augmented Spatial and Temporal Context","authors":"Zejia Fan, Jiaying Liu, Wenhan Yang, Wei Xiang, Zongming Guo","doi":"10.1109/ICASSP43922.2022.9746371","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9746371","url":null,"abstract":"Video super-resolution methods typically rely on paired training data, in which the low-resolution frames are usually synthetically generated under predetermined degradation conditions (e.g., Bicubic downsampling). However, in real applications, it is labor-consuming and expensive to obtain this kind of training data, which limits the practical performance of these methods. To address the issue and get rid of the synthetic paired data, in this paper, we make exploration in utilizing the internal self-similarity redundancy within the video to build a Self-Learned Video Super-Resolution (SLVSR) method, which only needs to be trained on the input testing video itself. We employ a series of data augmentation strategies to make full use of the spatial and temporal context of the target video clips. The idea is applied to two branches of mainstream SR methods: frame fusion and frame recurrence methods. Since the former takes advantage of the short-term temporal consistency and the latter of the long-term one, our method can satisfy different practical situations. The experimental results show the superiority of our proposed method, especially in addressing the video super-resolution problems in real applications.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115516220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Single-Shot Balanced Detector for Geospatial Object Detection 用于地理空间目标检测的单镜头平衡检测器
Yanfeng Liu, Qiang Li, Yuan Yuan, Qi Wang
Geospatial object detection is an essential task in remote sensing community. One-stage methods based on deep learning have faster running speed but cannot reach higher detection accuracy than two-stage methods. In this paper, to achieve excellent speed/accuracy trade-off for geospatial object detection, a single-shot balanced detector is presented. First, a balanced feature pyramid network (BFPN) is designed, which can balance semantic information and spatial information between high-level and shallow-level features adaptively. Second, we propose a task-interactive head (TIH). It can reduce the task misalignment between classification and regression. Extensive experiments show that the improved detector obtains significant detection accuracy with considerable speed on two benchmark datasets.
地理空间目标检测是遥感领域的一项重要任务。基于深度学习的单阶段方法运行速度更快,但无法达到比两阶段方法更高的检测精度。为了在地理空间目标检测中实现良好的速度/精度平衡,本文提出了一种单镜头平衡检测器。首先,设计了一种平衡特征金字塔网络(BFPN),该网络能够自适应平衡高层特征和浅层特征之间的语义信息和空间信息;其次,我们提出了一个任务交互头(TIH)。它可以减少分类和回归之间的任务偏差。大量实验表明,改进后的检测器在两个基准数据集上以相当快的速度获得了显著的检测精度。
{"title":"Single-Shot Balanced Detector for Geospatial Object Detection","authors":"Yanfeng Liu, Qiang Li, Yuan Yuan, Qi Wang","doi":"10.1109/icassp43922.2022.9746853","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746853","url":null,"abstract":"Geospatial object detection is an essential task in remote sensing community. One-stage methods based on deep learning have faster running speed but cannot reach higher detection accuracy than two-stage methods. In this paper, to achieve excellent speed/accuracy trade-off for geospatial object detection, a single-shot balanced detector is presented. First, a balanced feature pyramid network (BFPN) is designed, which can balance semantic information and spatial information between high-level and shallow-level features adaptively. Second, we propose a task-interactive head (TIH). It can reduce the task misalignment between classification and regression. Extensive experiments show that the improved detector obtains significant detection accuracy with considerable speed on two benchmark datasets.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122293822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1