首页 > 最新文献

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Group Sparsity Based Target Localization for Distributed Sensor Array Networks 基于组稀疏度的分布式传感器阵列网络目标定位
Qing Shen, Wei Liu, Li Wang, Yin Liu
The target localization problem for distributed sensor array networks where a sub-array is placed at each receiver is studied, and under the compressive sensing (CS) framework, a group sparsity based two-dimensional localization method is proposed. Instead of fusing the separately estimated angles of arrival (AOAs), it processes the information collected by all the receivers simultaneously to form the final target locations. Simulation results show that the proposed localization method provides a significant performance improvement compared with the commonly used maximum likelihood estimator (MLE).
研究了分布式传感器阵列网络的目标定位问题,在压缩感知(CS)框架下,提出了一种基于群稀疏度的二维定位方法。它不是融合单独估计的到达角(AOAs),而是同时处理所有接收器收集的信息以形成最终目标位置。仿真结果表明,与常用的极大似然估计(MLE)相比,所提出的定位方法具有显著的性能提升。
{"title":"Group Sparsity Based Target Localization for Distributed Sensor Array Networks","authors":"Qing Shen, Wei Liu, Li Wang, Yin Liu","doi":"10.1109/ICASSP.2019.8683867","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683867","url":null,"abstract":"The target localization problem for distributed sensor array networks where a sub-array is placed at each receiver is studied, and under the compressive sensing (CS) framework, a group sparsity based two-dimensional localization method is proposed. Instead of fusing the separately estimated angles of arrival (AOAs), it processes the information collected by all the receivers simultaneously to form the final target locations. Simulation results show that the proposed localization method provides a significant performance improvement compared with the commonly used maximum likelihood estimator (MLE).","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"101 1","pages":"4190-4194"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77440245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Estimation of Network Processes via Blind Graph Multi-filter Identification 基于盲图多滤波器辨识的网络过程估计
Yu Zhu, F. J. Garcia, A. Marques, Santiago Segarra
We study the problem of jointly estimating several network processes that are driven by the same input, recasting it as one of blind identification of a bank of graph filters. More precisely, we consider the observation of several graph signals – i.e., signals defined on the nodes of a graph – and we model each of these signals as the output of a different network process (represented by a graph filter) defined on a common known graph and driven by a common unknown input. Our goal is to recover the specifications of every network process by only observing the outputs. Since every process shares the same input, the estimation problems are coupled, and a joint inference method is proposed. We study two different scenarios, one where the orders of the filters are known, and one where they are not. For the former case we propose a least-squares approach and provide conditions for recovery. For the latter case, we put forth a sparse recovery algorithm with theoretical guarantees. Finally, we illustrate the methods here proposed via numerical experiments.
我们研究了由相同输入驱动的多个网络过程的联合估计问题,将其转化为一组图滤波器的盲识别问题。更准确地说,我们考虑对几个图信号的观察-即,在图的节点上定义的信号-我们将这些信号建模为不同网络过程(由图过滤器表示)的输出,这些网络过程定义在一个已知的图上,并由一个共同的未知输入驱动。我们的目标是仅通过观察输出来恢复每个网络流程的规范。由于每个过程共享相同的输入,将估计问题耦合起来,提出了一种联合推理方法。我们研究了两种不同的情况,一种是已知滤波器阶数的情况,另一种是未知的情况。对于前一种情况,我们提出了最小二乘方法,并给出了恢复的条件。对于后一种情况,我们提出了一种具有理论保证的稀疏恢复算法。最后,通过数值实验对本文提出的方法进行了验证。
{"title":"Estimation of Network Processes via Blind Graph Multi-filter Identification","authors":"Yu Zhu, F. J. Garcia, A. Marques, Santiago Segarra","doi":"10.1109/ICASSP.2019.8683844","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683844","url":null,"abstract":"We study the problem of jointly estimating several network processes that are driven by the same input, recasting it as one of blind identification of a bank of graph filters. More precisely, we consider the observation of several graph signals – i.e., signals defined on the nodes of a graph – and we model each of these signals as the output of a different network process (represented by a graph filter) defined on a common known graph and driven by a common unknown input. Our goal is to recover the specifications of every network process by only observing the outputs. Since every process shares the same input, the estimation problems are coupled, and a joint inference method is proposed. We study two different scenarios, one where the orders of the filters are known, and one where they are not. For the former case we propose a least-squares approach and provide conditions for recovery. For the latter case, we put forth a sparse recovery algorithm with theoretical guarantees. Finally, we illustrate the methods here proposed via numerical experiments.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"5451-5455"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88927885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An Efficient Algorithm for Hyperspectral Image Clustering 一种高效的高光谱图像聚类算法
Yushu Pan, Yuchen Jiao, Tiejian Li, Yuantao Gu
Hyperspectral images (HSIs) clustering problem is a challenge and valuable task due to its inherent complexity and abundant spectral information. Sparse subspace clustering (SSC) and SSC-based methods are widely used in this problem and demonstrate excellent performance. However, considering that HSIs are usually of high dimension, these methods have expensive computing complexity because of the usage of SSC. To solve this problem, we propose a novel approach called SuperPixel and Angle-based HyperSpectral Image Clustering (SPAHSIC). It first extracts the local spectral and spatial information between pixels by superpixel segmentation, and then applies spectral clustering on the similarity matrix built based on subspace principal angles. We implement experiments on real datasets and get a high accuracy, which indicates the effectiveness of our algorithm.
高光谱图像的聚类问题由于其固有的复杂性和丰富的光谱信息而成为一个具有挑战性和价值的课题。稀疏子空间聚类(SSC)和基于SSC的方法在该问题中得到了广泛的应用,并表现出优异的性能。然而,考虑到hsi通常是高维的,这些方法由于使用SSC而具有昂贵的计算复杂度。为了解决这个问题,我们提出了一种新的方法,称为超像素和基于角度的高光谱图像聚类(SPAHSIC)。首先通过超像素分割提取像素间的局部光谱信息和空间信息,然后对基于子空间主角构建的相似矩阵进行光谱聚类。在实际数据集上进行了实验,得到了较高的准确率,表明了算法的有效性。
{"title":"An Efficient Algorithm for Hyperspectral Image Clustering","authors":"Yushu Pan, Yuchen Jiao, Tiejian Li, Yuantao Gu","doi":"10.1109/ICASSP.2019.8683309","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683309","url":null,"abstract":"Hyperspectral images (HSIs) clustering problem is a challenge and valuable task due to its inherent complexity and abundant spectral information. Sparse subspace clustering (SSC) and SSC-based methods are widely used in this problem and demonstrate excellent performance. However, considering that HSIs are usually of high dimension, these methods have expensive computing complexity because of the usage of SSC. To solve this problem, we propose a novel approach called SuperPixel and Angle-based HyperSpectral Image Clustering (SPAHSIC). It first extracts the local spectral and spatial information between pixels by superpixel segmentation, and then applies spectral clustering on the similarity matrix built based on subspace principal angles. We implement experiments on real datasets and get a high accuracy, which indicates the effectiveness of our algorithm.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"140 1","pages":"2167-2171"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77633531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Immersive Audio Coding for Virtual Reality Using a Metadata-assisted Extension of the 3GPP EVS Codec 使用元数据辅助扩展的3GPP EVS编解码器的虚拟现实沉浸式音频编码
D. McGrath, S. Bruhn, H. Purnhagen, Michael Eckert, Juan Torres, Stefanie Brown, Dan Darcy
Virtual Reality (VR) audio scenes may be composed of a very large number of audio elements, including dynamic audio objects, fixed audio channels and scene-based audio elements such as Higher Order Ambisonics (HOA). Potentially, the subjective listening experience may be replicated using a compact spatial format with a set number of dynamic objects and scene-based elements, retaining only the perceptual essence of the audio scene. The compact format would further enable a reduction in the complexity of subsequent compression and rendering. This paper investigates these hypotheses by exploring the use of a compact format that consists of up to four dynamic objects and nine HOA channels, with the Enhanced Voice Services (EVS) codec being applied to a 4-channel down-mix of the compact format.
虚拟现实(VR)音频场景可以由非常大量的音频元素组成,包括动态音频对象、固定音频通道和基于场景的音频元素,如高阶立体声(HOA)。潜在地,主观聆听体验可以使用紧凑的空间格式与一定数量的动态对象和基于场景的元素来复制,只保留音频场景的感知本质。紧凑格式将进一步减少随后的压缩和呈现的复杂性。本文通过探索由多达4个动态对象和9个HOA通道组成的压缩格式的使用来研究这些假设,并将增强型语音服务(EVS)编解码器应用于压缩格式的4通道下混。
{"title":"Immersive Audio Coding for Virtual Reality Using a Metadata-assisted Extension of the 3GPP EVS Codec","authors":"D. McGrath, S. Bruhn, H. Purnhagen, Michael Eckert, Juan Torres, Stefanie Brown, Dan Darcy","doi":"10.1109/ICASSP.2019.8683712","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683712","url":null,"abstract":"Virtual Reality (VR) audio scenes may be composed of a very large number of audio elements, including dynamic audio objects, fixed audio channels and scene-based audio elements such as Higher Order Ambisonics (HOA). Potentially, the subjective listening experience may be replicated using a compact spatial format with a set number of dynamic objects and scene-based elements, retaining only the perceptual essence of the audio scene. The compact format would further enable a reduction in the complexity of subsequent compression and rendering. This paper investigates these hypotheses by exploring the use of a compact format that consists of up to four dynamic objects and nine HOA channels, with the Enhanced Voice Services (EVS) codec being applied to a 4-channel down-mix of the compact format.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"41 1","pages":"730-734"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77653511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Nonlinear Acceleration of Constrained Optimization Algorithms 约束优化算法的非线性加速
Vien V. Mai, M. Johansson
This paper introduces a novel technique for nonlinear acceleration of first-order methods for constrained convex optimization. Previous studies of nonlinear acceleration have only been able to provide convergence guarantees for unconstrained convex optimization. In contrast, our method is able to avoid infeasibility of the accelerated iterates and retains the theoretical performance guarantees of the unconstrained case. We focus on Anderson acceleration of the classical projected gradient descent (PGD) method, but our techniques can easily be extended to more sophisticated algorithms, such as mirror descent. Due to the presence of a constraint set, the relevant fixed-point mapping for PGD is not differentiable. However, we show that the convergence results for Anderson acceleration of smooth fixed-point iterations can be extended to the non-smooth case under certain technical conditions.
本文介绍了一阶约束凸优化方法的非线性加速新技术。以往的非线性加速度研究只能为无约束凸优化提供收敛性保证。相比之下,我们的方法能够避免加速迭代的不可行性,并保留了无约束情况下的理论性能保证。我们专注于经典的投影梯度下降(PGD)方法的安德森加速,但我们的技术可以很容易地扩展到更复杂的算法,如镜像下降。由于约束集的存在,PGD的不动点映射是不可微的。然而,我们证明了光滑不动点迭代的Anderson加速的收敛性结果在一定的技术条件下可以推广到非光滑情况。
{"title":"Nonlinear Acceleration of Constrained Optimization Algorithms","authors":"Vien V. Mai, M. Johansson","doi":"10.1109/ICASSP.2019.8682962","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682962","url":null,"abstract":"This paper introduces a novel technique for nonlinear acceleration of first-order methods for constrained convex optimization. Previous studies of nonlinear acceleration have only been able to provide convergence guarantees for unconstrained convex optimization. In contrast, our method is able to avoid infeasibility of the accelerated iterates and retains the theoretical performance guarantees of the unconstrained case. We focus on Anderson acceleration of the classical projected gradient descent (PGD) method, but our techniques can easily be extended to more sophisticated algorithms, such as mirror descent. Due to the presence of a constraint set, the relevant fixed-point mapping for PGD is not differentiable. However, we show that the convergence results for Anderson acceleration of smooth fixed-point iterations can be extended to the non-smooth case under certain technical conditions.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"17 1","pages":"4903-4907"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78029089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Block-randomized Stochastic Proximal Gradient for Constrained Low-rank Tensor Factorization 约束低秩张量分解的块随机随机近端梯度
Xiao Fu, Cheng Gao, Hoi-To Wai, Kejun Huang
This work focuses on canonical polyadic decomposition (CPD) for large-scale tensors. Many prior works rely on data sparsity to develop scalable CPD algorithms, which are not suitable for handling dense tensor, while dense tensors often arise in applications such as image and video processing. As an alternative, stochastic algorithms utilize data sampling to reduce per-iteration complexity and thus are very scalable, even when handling dense tensors. However, existing stochastic CPD algorithms are facing some challenges. For example, some algorithms are based on randomly sampled tensor entries, and thus each iteration can only updates a small portion of the latent factors. This may result in slow improvement of the estimation accuracy of the latent factors. In addition, the convergence properties of many stochastic CPD algorithms are unclear, perhaps because CPD poses a hard nonconvex problem and is challenging for analysis under stochastic settings. In this work, we propose a stochastic optimization strategy that can effectively circumvent the above challenges. The proposed algorithm updates a whole latent factor at each iteration using sampled fibers of a tensor, which can quickly increase the estimation accuracy. The algorithm is flexible—many commonly used regularizers and constraints can be easily incorporated in the computational framework. The algorithm is also backed by a rigorous convergence theory. Simulations on large-scale dense tensors are employed to showcase the effectiveness of the algorithm.
本文主要研究了大规模张量的正则多进分解(CPD)。许多先前的工作依赖于数据稀疏性来开发可扩展的CPD算法,这些算法不适合处理密集张量,而密集张量经常出现在图像和视频处理等应用中。作为替代方案,随机算法利用数据采样来降低每次迭代的复杂性,因此非常具有可扩展性,即使在处理密集张量时也是如此。然而,现有的随机CPD算法面临着一些挑战。例如,有些算法是基于随机采样的张量条目,因此每次迭代只能更新一小部分潜在因素。这可能导致潜在因素的估计精度提高缓慢。此外,许多随机CPD算法的收敛性质尚不清楚,可能是因为CPD提出了一个困难的非凸问题,并且在随机设置下的分析具有挑战性。在这项工作中,我们提出了一种随机优化策略,可以有效地规避上述挑战。该算法利用张量的采样纤维在每次迭代中更新整个潜在因子,可以快速提高估计精度。该算法是灵活的——许多常用的正则化器和约束可以很容易地合并到计算框架中。该算法还得到了严格的收敛理论的支持。通过大规模密集张量的仿真,验证了该算法的有效性。
{"title":"Block-randomized Stochastic Proximal Gradient for Constrained Low-rank Tensor Factorization","authors":"Xiao Fu, Cheng Gao, Hoi-To Wai, Kejun Huang","doi":"10.1109/ICASSP.2019.8682465","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682465","url":null,"abstract":"This work focuses on canonical polyadic decomposition (CPD) for large-scale tensors. Many prior works rely on data sparsity to develop scalable CPD algorithms, which are not suitable for handling dense tensor, while dense tensors often arise in applications such as image and video processing. As an alternative, stochastic algorithms utilize data sampling to reduce per-iteration complexity and thus are very scalable, even when handling dense tensors. However, existing stochastic CPD algorithms are facing some challenges. For example, some algorithms are based on randomly sampled tensor entries, and thus each iteration can only updates a small portion of the latent factors. This may result in slow improvement of the estimation accuracy of the latent factors. In addition, the convergence properties of many stochastic CPD algorithms are unclear, perhaps because CPD poses a hard nonconvex problem and is challenging for analysis under stochastic settings. In this work, we propose a stochastic optimization strategy that can effectively circumvent the above challenges. The proposed algorithm updates a whole latent factor at each iteration using sampled fibers of a tensor, which can quickly increase the estimation accuracy. The algorithm is flexible—many commonly used regularizers and constraints can be easily incorporated in the computational framework. The algorithm is also backed by a rigorous convergence theory. Simulations on large-scale dense tensors are employed to showcase the effectiveness of the algorithm.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"7485-7489"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80416538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Surgical Activities Recognition Using Multi-scale Recurrent Networks 基于多尺度递归网络的手术活动识别
Ilker Gurcan, H. Nguyen
Recently, surgical activity recognition has been receiving significant attention from the medical imaging community. Existing state-of-the-art approaches employ recurrent neural networks such as long-short term memory networks (LSTMs). However, our experiments show that these networks are not effective in capturing the relationship of features with different temporal scales. Such limitation will lead to sub-optimal recognition performance of surgical activities containing complex motions at multiple time scales. To overcome this shortcoming, our paper proposes a multi-scale recurrent neural network (MS-RNN) that combines the strength of both wavelet scattering operations and LSTM. We validate the effectiveness of the proposed network using both real and synthetic datasets. Our experimental results show that MS-RNN outperforms state-of-the-art methods in surgical activity recognition by a significant margin. On a synthetic dataset, the proposed network achieves more than 90% classification accuracy while LSTM’s accuracy is around chance level. Experiments on real surgical activity dataset shows a significant improvement of recognition accuracy over the current state of the art (90.2% versus 83.3%).
近年来,外科手术活动识别一直受到医学影像界的极大关注。现有的最先进的方法采用循环神经网络,如长短期记忆网络(LSTMs)。然而,我们的实验表明,这些网络不能有效地捕获不同时间尺度的特征之间的关系。这种限制将导致在多个时间尺度上对包含复杂运动的手术活动的次优识别性能。为了克服这一缺点,本文提出了一种结合小波散射运算和LSTM的多尺度递归神经网络(MS-RNN)。我们使用真实数据集和合成数据集验证了所提出网络的有效性。我们的实验结果表明,MS-RNN在手术活动识别方面明显优于最先进的方法。在合成数据集上,该网络的分类准确率达到90%以上,而LSTM的分类准确率在机会水平左右。在真实手术活动数据集上的实验表明,与目前的技术水平相比,识别准确率有了显著提高(90.2%对83.3%)。
{"title":"Surgical Activities Recognition Using Multi-scale Recurrent Networks","authors":"Ilker Gurcan, H. Nguyen","doi":"10.1109/ICASSP.2019.8683849","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683849","url":null,"abstract":"Recently, surgical activity recognition has been receiving significant attention from the medical imaging community. Existing state-of-the-art approaches employ recurrent neural networks such as long-short term memory networks (LSTMs). However, our experiments show that these networks are not effective in capturing the relationship of features with different temporal scales. Such limitation will lead to sub-optimal recognition performance of surgical activities containing complex motions at multiple time scales. To overcome this shortcoming, our paper proposes a multi-scale recurrent neural network (MS-RNN) that combines the strength of both wavelet scattering operations and LSTM. We validate the effectiveness of the proposed network using both real and synthetic datasets. Our experimental results show that MS-RNN outperforms state-of-the-art methods in surgical activity recognition by a significant margin. On a synthetic dataset, the proposed network achieves more than 90% classification accuracy while LSTM’s accuracy is around chance level. Experiments on real surgical activity dataset shows a significant improvement of recognition accuracy over the current state of the art (90.2% versus 83.3%).","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"13 1","pages":"2887-2891"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82478817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Multi-step Self-attention Network for Cross-modal Retrieval Based on a Limited Text Space 基于有限文本空间的跨模态检索多步自关注网络
Zheng Yu, Wenmin Wang, Ge Li
Cross-modal retrieval has been recently proposed to find an appropriate subspace where the similarity among different modalities, such as image and text, can be directly measured. In this paper, we propose Multi-step Self-Attention Network (MSAN) to perform cross-modal retrieval in a limited text space with multiple attention steps, that can selectively attend to partial shared information at each step and aggregate useful information over multiple steps to measure the final similarity. In order to achieve better retrieval results with faster training speed, we introduce global prior knowledge as the global reference information. Extensive experiments on Flickr30K and MSCOCO, show that MSAN achieves new state-of-the-art results in accuracy for cross-modal retrieval.
跨模态检索最近被提出,用来寻找一个合适的子空间来直接测量不同模态(如图像和文本)之间的相似性。在本文中,我们提出了多步自注意网络(Multi-step Self-Attention Network, MSAN),在有限的文本空间中使用多个注意步骤进行跨模态检索,该网络可以在每一步选择性地关注部分共享信息,并在多个步骤中聚合有用信息以度量最终的相似性。为了以更快的训练速度获得更好的检索结果,我们引入全局先验知识作为全局参考信息。在Flickr30K和MSCOCO上进行的大量实验表明,MSAN在跨模态检索的准确性方面取得了新的最先进的结果。
{"title":"Multi-step Self-attention Network for Cross-modal Retrieval Based on a Limited Text Space","authors":"Zheng Yu, Wenmin Wang, Ge Li","doi":"10.1109/ICASSP.2019.8682424","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682424","url":null,"abstract":"Cross-modal retrieval has been recently proposed to find an appropriate subspace where the similarity among different modalities, such as image and text, can be directly measured. In this paper, we propose Multi-step Self-Attention Network (MSAN) to perform cross-modal retrieval in a limited text space with multiple attention steps, that can selectively attend to partial shared information at each step and aggregate useful information over multiple steps to measure the final similarity. In order to achieve better retrieval results with faster training speed, we introduce global prior knowledge as the global reference information. Extensive experiments on Flickr30K and MSCOCO, show that MSAN achieves new state-of-the-art results in accuracy for cross-modal retrieval.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"16 1","pages":"2082-2086"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82559134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reflection Symmetry Detection by Embedding Symmetry in a Graph 在图中嵌入对称的反射对称检测
R. Nagar, S. Raman
Reflection symmetry is ubiquitous in nature and plays an important role in object detection and recognition tasks. Most of the existing methods for symmetry detection extract and describe each keypoint using a descriptor and a mirrored descriptor. Two keypoints are said to be mirror symmetric key-points if the original descriptor of one keypoint and the mirrored descriptor of the other keypoint are similar. However, these methods suffer from the following issue. The background pixels around the mirror symmetric pixels lying on the boundary of an object can be different. Therefore, their descriptors can be different. However, the boundary of a symmetric object is a major component of global reflection symmetry. We exploit the estimated boundary of the object and describe a boundary pixel using only the estimated normal of the boundary segment around the pixel. We embed the symmetry axes in a graph as cliques to robustly detect the symmetry axes. We show that this approach achieves state-of-the-art results in a standard dataset.
反射对称在自然界中普遍存在,在物体检测和识别任务中起着重要作用。现有的对称检测方法大多使用描述符和镜像描述符提取和描述每个关键点。如果一个关键点的原始描述符和另一个关键点的镜像描述符相似,则两个关键点被称为镜像对称关键点。然而,这些方法存在以下问题。位于物体边界上的镜像对称像素周围的背景像素可以不同。因此,它们的描述符可以是不同的。然而,对称物体的边界是全局反射对称的主要组成部分。我们利用物体的估计边界,并仅使用像素周围的边界段的估计法线来描述边界像素。我们将对称轴以团的形式嵌入图中,以鲁棒检测对称轴。我们表明,这种方法在标准数据集中实现了最先进的结果。
{"title":"Reflection Symmetry Detection by Embedding Symmetry in a Graph","authors":"R. Nagar, S. Raman","doi":"10.1109/ICASSP.2019.8682412","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682412","url":null,"abstract":"Reflection symmetry is ubiquitous in nature and plays an important role in object detection and recognition tasks. Most of the existing methods for symmetry detection extract and describe each keypoint using a descriptor and a mirrored descriptor. Two keypoints are said to be mirror symmetric key-points if the original descriptor of one keypoint and the mirrored descriptor of the other keypoint are similar. However, these methods suffer from the following issue. The background pixels around the mirror symmetric pixels lying on the boundary of an object can be different. Therefore, their descriptors can be different. However, the boundary of a symmetric object is a major component of global reflection symmetry. We exploit the estimated boundary of the object and describe a boundary pixel using only the estimated normal of the boundary segment around the pixel. We embed the symmetry axes in a graph as cliques to robustly detect the symmetry axes. We show that this approach achieves state-of-the-art results in a standard dataset.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"4012 2 1","pages":"2147-2151"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86699508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Convex Combination of Constraint Vectors for Set-membership Affine Projection Algorithms 集隶属仿射投影算法约束向量的凸组合
T. Ferreira, W. Martins, Markus V. S. Lima, P. Diniz
Set-membership affine projection (SM-AP) adaptive filters have been increasingly employed in the context of online data-selective learning. A key aspect for their good performance in terms of both convergence speed and steady-state mean-squared error is the choice of the so-called constraint vector. Optimal constraint vectors were recently proposed relying on convex optimization tools, which might sometimes lead to prohibitive computational burden. This paper proposes a convex combination of simpler constraint vectors whose performance approaches the optimal solution closely, utilizing much fewer computations. Some illustrative examples confirm that the sub-optimal solution follows the accomplishments of the optimal one.
集隶属度仿射投影(SM-AP)自适应滤波器越来越多地应用于在线数据选择学习。它们在收敛速度和稳态均方误差方面表现良好的一个关键方面是选择所谓的约束向量。最近提出的最优约束向量依赖于凸优化工具,有时可能导致令人望而却步的计算负担。本文提出了一种更简单约束向量的凸组合,其性能接近最优解,使用的计算量少得多。一些说明性的例子证实了次最优解遵循最优解的结果。
{"title":"Convex Combination of Constraint Vectors for Set-membership Affine Projection Algorithms","authors":"T. Ferreira, W. Martins, Markus V. S. Lima, P. Diniz","doi":"10.1109/ICASSP.2019.8682305","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682305","url":null,"abstract":"Set-membership affine projection (SM-AP) adaptive filters have been increasingly employed in the context of online data-selective learning. A key aspect for their good performance in terms of both convergence speed and steady-state mean-squared error is the choice of the so-called constraint vector. Optimal constraint vectors were recently proposed relying on convex optimization tools, which might sometimes lead to prohibitive computational burden. This paper proposes a convex combination of simpler constraint vectors whose performance approaches the optimal solution closely, utilizing much fewer computations. Some illustrative examples confirm that the sub-optimal solution follows the accomplishments of the optimal one.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"222 1","pages":"4858-4862"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89124154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1