首页 > 最新文献

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Range Guided Depth Refinement and Uncertainty-Aware Aggregation for View Synthesis 视野合成的距离引导深度细化和不确定性感知聚合
Yuan Chang, Yisong Chen, Guoping Wang
In this paper, we present a framework of view synthesis, including range guided depth refinement and uncertainty-aware aggregation based novel view synthesis. We first propose a novel depth refinement method to improve the quality and robustness of the depth map reconstruction. To that end, we use a range prior to constrain the estimated depth, which helps us to get more accurate depth information. Then we propose an uncertainty-aware aggregation method for novel view synthesis. We compute the uncertainty of the estimated depth for each pixel, and reduce the influence of pixels whose uncertainty are large when synthesizing novel views. This step helps to reduce some artifacts such as ghost and blur. We validate the performance of our algorithm experimentally, and we show that our approach achieves state-of-the-art performance.
本文提出了一种基于距离引导深度细化和不确定性感知聚合的新视图合成框架。为了提高深度图重建的质量和鲁棒性,提出了一种新的深度细化方法。为此,我们使用先验范围来约束估计深度,这有助于我们获得更准确的深度信息。在此基础上,提出了一种不确定性感知聚合的新视图合成方法。我们计算了每个像素估计深度的不确定性,并减少了不确定性较大的像素在合成新视图时的影响。这一步有助于减少一些伪影,如鬼影和模糊。我们通过实验验证了算法的性能,并表明我们的方法达到了最先进的性能。
{"title":"Range Guided Depth Refinement and Uncertainty-Aware Aggregation for View Synthesis","authors":"Yuan Chang, Yisong Chen, Guoping Wang","doi":"10.1109/ICASSP39728.2021.9413981","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413981","url":null,"abstract":"In this paper, we present a framework of view synthesis, including range guided depth refinement and uncertainty-aware aggregation based novel view synthesis. We first propose a novel depth refinement method to improve the quality and robustness of the depth map reconstruction. To that end, we use a range prior to constrain the estimated depth, which helps us to get more accurate depth information. Then we propose an uncertainty-aware aggregation method for novel view synthesis. We compute the uncertainty of the estimated depth for each pixel, and reduce the influence of pixels whose uncertainty are large when synthesizing novel views. This step helps to reduce some artifacts such as ghost and blur. We validate the performance of our algorithm experimentally, and we show that our approach achieves state-of-the-art performance.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123999494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RGLN: Robust Residual Graph Learning Networks via Similarity-Preserving Mapping on Graphs 基于图上保持相似映射的鲁棒残差图学习网络
Jiaxiang Tang, Xiang Gao, Wei Hu
Graph Convolutional Neural Networks (GCNNs) extend CNNs to irregular graph data domain, such as brain networks, citation networks and 3D point clouds. It is critical to identify an appropriate graph for basic operations in GCNNs. Existing methods often manually construct or learn one fixed graph based on known connectivities, which may be sub-optimal. To this end, we propose a residual graph learning paradigm to infer edge connectivities and weights in graphs, which is cast as distance metric learning under a low-rank assumption and a similarity-preserving regularization. In particular, we learn the underlying graph based on similarity-preserving mapping on graphs, which keeps similar nodes close and pushes dissimilar nodes away. Extensive experiments on semi-supervised learning of citation networks and 3D point clouds show that we achieve the state-of-the-art performance in terms of both accuracy and robustness.
图卷积神经网络(GCNNs)将cnn扩展到不规则的图数据领域,如脑网络、引文网络和三维点云。为gcnn的基本操作识别合适的图是至关重要的。现有的方法通常是基于已知的连通性手动构建或学习一个固定的图,这可能是次优的。为此,我们提出了一种残差图学习范式来推断图中边缘的连通性和权值,这是在低秩假设和保持相似度的正则化下的距离度量学习。特别是,我们基于图上的相似保持映射来学习底层图,这种映射使相似的节点保持接近,并将不相似的节点推开。在引文网络和3D点云的半监督学习上的大量实验表明,我们在准确性和鲁棒性方面都达到了最先进的性能。
{"title":"RGLN: Robust Residual Graph Learning Networks via Similarity-Preserving Mapping on Graphs","authors":"Jiaxiang Tang, Xiang Gao, Wei Hu","doi":"10.1109/ICASSP39728.2021.9414792","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414792","url":null,"abstract":"Graph Convolutional Neural Networks (GCNNs) extend CNNs to irregular graph data domain, such as brain networks, citation networks and 3D point clouds. It is critical to identify an appropriate graph for basic operations in GCNNs. Existing methods often manually construct or learn one fixed graph based on known connectivities, which may be sub-optimal. To this end, we propose a residual graph learning paradigm to infer edge connectivities and weights in graphs, which is cast as distance metric learning under a low-rank assumption and a similarity-preserving regularization. In particular, we learn the underlying graph based on similarity-preserving mapping on graphs, which keeps similar nodes close and pushes dissimilar nodes away. Extensive experiments on semi-supervised learning of citation networks and 3D point clouds show that we achieve the state-of-the-art performance in terms of both accuracy and robustness.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124009029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Improving Ultrasound Tongue Contour Extraction Using U-Net and Shape Consistency-Based Regularizer 利用U-Net和基于形状一致性的正则化器改进超声舌形轮廓提取
Ming Feng, Yin Wang, Kele Xu, Huaimin Wang, Bo Ding
B-mode ultrasound tongue imaging is widely used to visualize the tongue motion, due to its appearing properties. Extracting the tongue surface contour in the B-mode ultrasound image is still a challenge, while it is a prerequisite for further quantitative analysis. Recently, deep learning-based approach has been adopted in this task. However, the standard deep models fail to address faint contour when the ultrasound wave goes parallel to the tongue surface. To address the faint or missing contours in the sequence, we explore the shape consistency-based regularizer, which can take sequential information into account. By incorporating the regularizer, the deep model not only can extract frame-specific contours, but also can enforce the similarity between the contours extracted from adjacent frames. Extensive experiments are conducted both on the synthetic and real ultrasound tongue imaging dataset and the results demonstrate the effectiveness of proposed method. To better promote the research in this field, we have released our code at1.
b型超声舌部显像由于其显像特性而被广泛应用于舌部运动的显像。b超图像中舌面轮廓的提取仍然是一个挑战,但这是进一步定量分析的前提。最近,基于深度学习的方法被用于该任务。然而,当超声波平行于舌面时,标准深度模型无法解决模糊轮廓。为了解决序列中模糊或缺失的轮廓,我们探索了基于形状一致性的正则化器,该正则化器可以考虑序列信息。通过引入正则化器,深度模型不仅可以提取特定帧的轮廓,而且可以增强从相邻帧中提取的轮廓之间的相似性。在合成和真实的超声舌成像数据集上进行了大量的实验,结果证明了该方法的有效性。为了更好地促进这一领域的研究,我们在1中发布了我们的代码。
{"title":"Improving Ultrasound Tongue Contour Extraction Using U-Net and Shape Consistency-Based Regularizer","authors":"Ming Feng, Yin Wang, Kele Xu, Huaimin Wang, Bo Ding","doi":"10.1109/ICASSP39728.2021.9414420","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414420","url":null,"abstract":"B-mode ultrasound tongue imaging is widely used to visualize the tongue motion, due to its appearing properties. Extracting the tongue surface contour in the B-mode ultrasound image is still a challenge, while it is a prerequisite for further quantitative analysis. Recently, deep learning-based approach has been adopted in this task. However, the standard deep models fail to address faint contour when the ultrasound wave goes parallel to the tongue surface. To address the faint or missing contours in the sequence, we explore the shape consistency-based regularizer, which can take sequential information into account. By incorporating the regularizer, the deep model not only can extract frame-specific contours, but also can enforce the similarity between the contours extracted from adjacent frames. Extensive experiments are conducted both on the synthetic and real ultrasound tongue imaging dataset and the results demonstrate the effectiveness of proposed method. To better promote the research in this field, we have released our code at1.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124056902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
On Loss Functions for Deep-Learning Based T60 Estimation 基于深度学习T60估计的损失函数研究
Yuying Li, Yuchen Liu, D. Williamson
Reverberation time, T60, directly influences the amount of reverberation in a signal, and its direct estimation may help with dereverberation. Traditionally, T60 estimation has been done using signal processing or probabilistic approaches, until recently where deep-learning approaches have been developed. Unfortunately, the appropriate loss function for training the network has not been adequately determined. In this paper, we propose a composite classification- and regression-based cost function for training a deep neural network that predicts T60 for a variety of reverberant signals. We investigate pure-classification, pure-regression, and combined classification-regression based loss functions, where we additionally incorporate computational measures of success. Our results reveal that our composite loss function leads to the best performance as compared to other loss functions and comparison approaches. We also show that this combined loss function helps with generalization.
混响时间T60直接影响信号中的混响量,它的直接估计可能有助于去混响。传统上,T60估计是使用信号处理或概率方法完成的,直到最近深度学习方法被开发出来。不幸的是,训练网络的合适损失函数还没有被充分确定。在本文中,我们提出了一种基于分类和回归的复合成本函数,用于训练一个深度神经网络,该网络可以预测各种混响信号的T60。我们研究了纯分类、纯回归和基于组合分类回归的损失函数,其中我们还结合了成功的计算度量。我们的结果表明,与其他损失函数和比较方法相比,我们的复合损失函数具有最佳性能。我们还证明了这种组合损失函数有助于泛化。
{"title":"On Loss Functions for Deep-Learning Based T60 Estimation","authors":"Yuying Li, Yuchen Liu, D. Williamson","doi":"10.1109/ICASSP39728.2021.9414442","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414442","url":null,"abstract":"Reverberation time, T60, directly influences the amount of reverberation in a signal, and its direct estimation may help with dereverberation. Traditionally, T60 estimation has been done using signal processing or probabilistic approaches, until recently where deep-learning approaches have been developed. Unfortunately, the appropriate loss function for training the network has not been adequately determined. In this paper, we propose a composite classification- and regression-based cost function for training a deep neural network that predicts T60 for a variety of reverberant signals. We investigate pure-classification, pure-regression, and combined classification-regression based loss functions, where we additionally incorporate computational measures of success. Our results reveal that our composite loss function leads to the best performance as compared to other loss functions and comparison approaches. We also show that this combined loss function helps with generalization.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124125739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Improved Intra Mode Coding Beyond Av1 超越Av1的改进内模式编码
Yize Jin, Liang Zhao, Xin Zhao, Shangyi Liu, A. Bovik
In AOMedia Video 1 (AV1), directional intra prediction modes are applied to model local texture patterns that present certain directionality. Each intra prediction direction is represented with a nominal mode index and a delta angle. The delta angle is entropy coded using shared context between luma and chroma, and the context is derived using the associated nominal mode. In this paper, two methods are proposed to further reduce the signaling cost of delta angles: cross-component delta angle coding, and context-adaptive delta angle coding, whereby the cross-component and spatial correlation of the delta angles are explored, respectively. The proposed methods were implemented on top of a recent version of libaom. Experimental results show that the proposed cross-component delta angle coding achieved average 0.4% BD-rate reduction with 4% encoding time saving over all intra configurations. By combining both methods, an average 1.2% BD-rate reduction is achieved.
在amediavideo 1 (AV1)中,采用定向内预测模式对具有一定方向性的局部纹理模式进行建模。每个内预测方向用标称模态指数和δ角表示。三角角使用亮度和色度之间的共享上下文进行熵编码,并且使用相关的标称模式导出上下文。本文提出了两种进一步降低三角角信令成本的方法:跨分量三角角编码和上下文自适应三角角编码,分别探讨了三角角的跨分量和空间相关性。所提出的方法是在libaom的最新版本之上实现的。实验结果表明,所提出的跨分量三角角编码在所有帧内配置下平均降低了0.4%的bd率,节省了4%的编码时间。通过结合这两种方法,平均降低了1.2%的钻井速率。
{"title":"Improved Intra Mode Coding Beyond Av1","authors":"Yize Jin, Liang Zhao, Xin Zhao, Shangyi Liu, A. Bovik","doi":"10.1109/ICASSP39728.2021.9413420","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413420","url":null,"abstract":"In AOMedia Video 1 (AV1), directional intra prediction modes are applied to model local texture patterns that present certain directionality. Each intra prediction direction is represented with a nominal mode index and a delta angle. The delta angle is entropy coded using shared context between luma and chroma, and the context is derived using the associated nominal mode. In this paper, two methods are proposed to further reduce the signaling cost of delta angles: cross-component delta angle coding, and context-adaptive delta angle coding, whereby the cross-component and spatial correlation of the delta angles are explored, respectively. The proposed methods were implemented on top of a recent version of libaom. Experimental results show that the proposed cross-component delta angle coding achieved average 0.4% BD-rate reduction with 4% encoding time saving over all intra configurations. By combining both methods, an average 1.2% BD-rate reduction is achieved.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126482629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
HSAN: A Hierarchical Self-Attention Network for Multi-Turn Dialogue Generation 多回合对话生成的层次自关注网络
Yawei Kong, Lu Zhang, Can Ma, Cong Cao
In the multi-turn dialogue system, response generation is not only related to the sentences in context but also relies on the words in each utterance. Although there are lots of methods that pay attention to model words and utterances, there still exist problems such as tending to generate common responses. In this paper, we propose a hierarchical self-attention network, named HSAN, which attends to the important words and utterances in context simultaneously. Firstly, we use the hierarchical encoder to update the word and utterance representations with their position information respectively. Secondly, the response representations are updated by the mask self-attention module in the decoder. Finally, the relevance between utterances and response is computed by another self-attention module and used for the next response decoding process. In terms of automatic metrics and human judgements, experimental results show that HSAN significantly outperforms all baselines on two common public datasets.
在多回合对话系统中,反应的产生不仅与语境中的句子有关,而且依赖于每个话语中的单词。虽然有很多方法关注模型词和话语,但仍然存在容易产生共同反应等问题。在本文中,我们提出了一个分层自注意网络HSAN,它同时关注语境中的重要词语和话语。首先,我们使用分层编码器分别用位置信息更新单词和话语表示。其次,由解码器中的掩码自关注模块更新响应表示。最后,话语和应答之间的相关性由另一个自注意模块计算,并用于下一个应答解码过程。在自动度量和人工判断方面,实验结果表明,HSAN在两个公共数据集上显著优于所有基线。
{"title":"HSAN: A Hierarchical Self-Attention Network for Multi-Turn Dialogue Generation","authors":"Yawei Kong, Lu Zhang, Can Ma, Cong Cao","doi":"10.1109/ICASSP39728.2021.9413753","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413753","url":null,"abstract":"In the multi-turn dialogue system, response generation is not only related to the sentences in context but also relies on the words in each utterance. Although there are lots of methods that pay attention to model words and utterances, there still exist problems such as tending to generate common responses. In this paper, we propose a hierarchical self-attention network, named HSAN, which attends to the important words and utterances in context simultaneously. Firstly, we use the hierarchical encoder to update the word and utterance representations with their position information respectively. Secondly, the response representations are updated by the mask self-attention module in the decoder. Finally, the relevance between utterances and response is computed by another self-attention module and used for the next response decoding process. In terms of automatic metrics and human judgements, experimental results show that HSAN significantly outperforms all baselines on two common public datasets.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125657885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Seizure Detection Using Power Spectral Density via Hyperdimensional Computing 基于超维计算的功率谱密度癫痫检测
Lulu Ge, K. Parhi
Hyperdimensional (HD) computing holds promise for classifying two groups of data. This paper explores seizure detection from electroencephalogram (EEG) from subjects with epilepsy using HD computing based on power spectral density (PSD) features. Publicly available intra-cranial EEG (iEEG) data collected from 4 dogs and 8 human patients in the Kaggle seizure detection contest are used in this paper. This paper explores two methods for classification. First, few ranked PSD features from small number of channels from a prior classification are used in the context of HD classification. Second, all PSD features extracted from all channels are used as features for HD classification. It is shown that for about half the subjects small number features outperform all features in the context of HD classification, and for the other half, all features outperform small number of features. HD classification achieves above 95% accuracy for six of the 12 subjects, and between 85-95% accuracy for 4 subjects. For two subjects, the classification accuracy using HD computing is not as good as classical approaches such as support vector machine classifiers.
超维(HD)计算有望对两组数据进行分类。本文探讨了基于功率谱密度(PSD)特征的高清计算在癫痫患者脑电图(EEG)中的检测方法。本文使用了在Kaggle癫痫检测竞赛中收集的4只狗和8名人类患者的公开颅内脑电图(iEEG)数据。本文探讨了两种分类方法。首先,在高清分类的背景下,很少使用来自少量先验分类通道的PSD特征。其次,将从所有通道提取的所有PSD特征作为HD分类的特征。结果表明,对于大约一半的受试者,在HD分类背景下,少量特征优于所有特征,而对于另一半受试者,所有特征优于少量特征。12个科目中有6个科目的HD分类准确率达到95%以上,4个科目的准确率在85-95%之间。对于两个主题,使用HD计算的分类精度不如经典方法如支持向量机分类器。
{"title":"Seizure Detection Using Power Spectral Density via Hyperdimensional Computing","authors":"Lulu Ge, K. Parhi","doi":"10.1109/ICASSP39728.2021.9414083","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414083","url":null,"abstract":"Hyperdimensional (HD) computing holds promise for classifying two groups of data. This paper explores seizure detection from electroencephalogram (EEG) from subjects with epilepsy using HD computing based on power spectral density (PSD) features. Publicly available intra-cranial EEG (iEEG) data collected from 4 dogs and 8 human patients in the Kaggle seizure detection contest are used in this paper. This paper explores two methods for classification. First, few ranked PSD features from small number of channels from a prior classification are used in the context of HD classification. Second, all PSD features extracted from all channels are used as features for HD classification. It is shown that for about half the subjects small number features outperform all features in the context of HD classification, and for the other half, all features outperform small number of features. HD classification achieves above 95% accuracy for six of the 12 subjects, and between 85-95% accuracy for 4 subjects. For two subjects, the classification accuracy using HD computing is not as good as classical approaches such as support vector machine classifiers.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"46 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125704055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
An Adaptive Non-Linear Process for Under-Determined Virtual Microphone Beamforming 欠定虚拟麦克风波束形成的自适应非线性过程
M. Bekrani, Anh H. T. Nguyen, Andy W. H. Khong
Virtual microphone beamforming techniques are attractive for devices limited by space constraints. These techniques synthesize virtual microphone signals via interpolation algorithms. We propose to extend existing virtual microphone signal interpolation by employing an adaptive non-linear (ANL) process for acoustic beamforming. The proposed ANL based interpolation utilizes a target-presence probability criteria to determine the degree of non-linearity. The beamformer output is then derived using a combination between interpolations during target inactive zones and target active zones. Such combination offers a trade-off between reducing interference and target signal distortion. We apply the proposed ANL-based interpolator to the maximum signal-to-noise ratio (MSNR) beamformer and compare its performance against conventional beamforming and virtual microphone based beamforming methods in under-determined situations.
虚拟麦克风波束形成技术对受空间限制的设备具有吸引力。这些技术通过插值算法合成虚拟麦克风信号。我们建议通过采用自适应非线性(ANL)过程进行声波束形成来扩展现有的虚拟麦克风信号插值。所提出的基于ANL的插值利用目标存在概率准则来确定非线性程度。然后,在目标非活动区域和目标活动区域之间使用插值组合来导出波束形成器输出。这种组合提供了减少干扰和目标信号失真之间的权衡。我们将基于anl的插值器应用于最大信噪比(MSNR)波束形成器,并在欠确定情况下将其与传统波束形成方法和基于虚拟麦克风的波束形成方法的性能进行比较。
{"title":"An Adaptive Non-Linear Process for Under-Determined Virtual Microphone Beamforming","authors":"M. Bekrani, Anh H. T. Nguyen, Andy W. H. Khong","doi":"10.1109/ICASSP39728.2021.9413813","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413813","url":null,"abstract":"Virtual microphone beamforming techniques are attractive for devices limited by space constraints. These techniques synthesize virtual microphone signals via interpolation algorithms. We propose to extend existing virtual microphone signal interpolation by employing an adaptive non-linear (ANL) process for acoustic beamforming. The proposed ANL based interpolation utilizes a target-presence probability criteria to determine the degree of non-linearity. The beamformer output is then derived using a combination between interpolations during target inactive zones and target active zones. Such combination offers a trade-off between reducing interference and target signal distortion. We apply the proposed ANL-based interpolator to the maximum signal-to-noise ratio (MSNR) beamformer and compare its performance against conventional beamforming and virtual microphone based beamforming methods in under-determined situations.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126011349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Dependence-Guided Multi-View Clustering 依赖导向的多视图聚类
Xia Dong, Danyang Wu, F. Nie, Rong Wang, Xuelong Li
In this paper, we propose a novel approach called dependence-guided multi-view clustering (DGMC). Our model enhances the dependence between unified embedding learning and clustering, as well as promotes the dependence between unified embedding and embedding of each view. Specifically, DGMC learns a unified embedding and partitions data in a joint fashion, thus the clustering results can be directly obtained. A kernel dependence measure is employed to learn a unified embedding by forcing it to be close to different views, thus the complex dependence among different views can be captured. Moreover, an implicit-weight learning mechanism is provided to ensure the diversity of different views. An efficient algorithm with rigorous convergence analysis is derived to solve the proposed model. Experimental results demonstrate the advantages of the proposed method over the state of the arts on real-world datasets.
在本文中,我们提出了一种新的方法,称为依赖引导的多视图聚类(DGMC)。该模型增强了统一嵌入学习与聚类之间的依赖关系,促进了统一嵌入与各视图嵌入之间的依赖关系。具体来说,DGMC学习统一的嵌入,并以联合的方式对数据进行划分,从而可以直接获得聚类结果。采用核依赖度量来学习统一的嵌入,迫使其接近不同的视图,从而捕获不同视图之间的复杂依赖关系。此外,还提供了一种隐式权重学习机制,以保证不同观点的多样性。推导出一种具有严格收敛性的高效算法来求解该模型。实验结果表明,该方法在实际数据集上优于现有方法。
{"title":"Dependence-Guided Multi-View Clustering","authors":"Xia Dong, Danyang Wu, F. Nie, Rong Wang, Xuelong Li","doi":"10.1109/ICASSP39728.2021.9414971","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414971","url":null,"abstract":"In this paper, we propose a novel approach called dependence-guided multi-view clustering (DGMC). Our model enhances the dependence between unified embedding learning and clustering, as well as promotes the dependence between unified embedding and embedding of each view. Specifically, DGMC learns a unified embedding and partitions data in a joint fashion, thus the clustering results can be directly obtained. A kernel dependence measure is employed to learn a unified embedding by forcing it to be close to different views, thus the complex dependence among different views can be captured. Moreover, an implicit-weight learning mechanism is provided to ensure the diversity of different views. An efficient algorithm with rigorous convergence analysis is derived to solve the proposed model. Experimental results demonstrate the advantages of the proposed method over the state of the arts on real-world datasets.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121922384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Linear Multichannel Blind Source Separation based on Time-Frequency Mask Obtained by Harmonic/Percussive Sound Separation 基于谐波/冲击声分离时频掩模的线性多通道盲源分离
Soichiro Oyabu, Daichi Kitamura, K. Yatabe
Determined blind source separation (BSS) extracts the source signals by linear multichannel filtering. Its performance depends on the accuracy of source modeling, and hence existing BSS methods have proposed several source models. Recently, a new determined BSS algorithm that incorporates a time-frequency mask has been proposed. It enables very flexible source modeling because the model is implicitly defined by a mask-generating function. Building up on this framework, in this paper, we propose a unification of determined BSS and harmonic/percussive sound separation (HPSS). HPSS is an important preprocessing for musical applications. By incorporating HPSS, both harmonic and percussive instruments can be accurately modeled for determined BSS. The resultant algorithm estimates the demixing filter using the information obtained by an HPSS method. We also propose a stabilization method that is essential for the proposed algorithm. Our experiments showed that the proposed method outperformed both HPSS and determined BSS methods including independent low-rank matrix analysis.
确定盲源分离(BSS)通过线性多路滤波提取源信号。它的性能取决于源建模的准确性,因此现有的BSS方法提出了几种源模型。最近,提出了一种结合时频掩模的确定BSS算法。它支持非常灵活的源建模,因为模型是由掩码生成函数隐式定义的。在此基础上,本文提出了一种确定声分离和谐波/打击声分离(HPSS)的统一方法。HPSS是一种重要的音乐预处理技术。通过结合HPSS,谐波和打击乐器都可以精确地模拟确定的BSS。所得算法利用HPSS方法获得的信息估计除混滤波器。我们还提出了一种对所提出的算法至关重要的稳定化方法。我们的实验表明,该方法优于HPSS和包含独立低秩矩阵分析的确定BSS方法。
{"title":"Linear Multichannel Blind Source Separation based on Time-Frequency Mask Obtained by Harmonic/Percussive Sound Separation","authors":"Soichiro Oyabu, Daichi Kitamura, K. Yatabe","doi":"10.1109/ICASSP39728.2021.9413494","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413494","url":null,"abstract":"Determined blind source separation (BSS) extracts the source signals by linear multichannel filtering. Its performance depends on the accuracy of source modeling, and hence existing BSS methods have proposed several source models. Recently, a new determined BSS algorithm that incorporates a time-frequency mask has been proposed. It enables very flexible source modeling because the model is implicitly defined by a mask-generating function. Building up on this framework, in this paper, we propose a unification of determined BSS and harmonic/percussive sound separation (HPSS). HPSS is an important preprocessing for musical applications. By incorporating HPSS, both harmonic and percussive instruments can be accurately modeled for determined BSS. The resultant algorithm estimates the demixing filter using the information obtained by an HPSS method. We also propose a stabilization method that is essential for the proposed algorithm. Our experiments showed that the proposed method outperformed both HPSS and determined BSS methods including independent low-rank matrix analysis.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122275024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1