首页 > 最新文献

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Adaptive Reduced-Dimensional Beamspace Beamformer Design by Analogue Beam Selection 基于模拟波束选择的自适应降维波束空间波束形成器设计
Xiangrong Wang, E. Aboutanios
Adaptive beamforming of large antenna arrays is difficult to implement due to prohibitively high hardware cost and computational complexity. An antenna selection strategy was utilized to maximize the output signal-to-interference-plus- noise ratio (SINR) with fewer antennas by optimizing array configurations. However, antenna selection scheme exhibits high degradation in performance compared to the full array system. In this paper, we consider a reduced-dimensional beamspace beamformer, where analogue phase shifters adaptively synthesize a subset of orthogonal beams whose outputs are then processed in a beamspace beamformer. We examine the selection problem to adaptively identify the beams most relevant to achieving almost the full beamspace performance, especially in the generalized case without any prior information. Simulation results demonstrated that the beam selection enjoys the complexity advantages, while simultaneously enhancing the output SINR of antenna selection.
大型天线阵列的自适应波束形成由于其高昂的硬件成本和计算复杂度而难以实现。采用天线选择策略,通过优化阵列配置,在天线数量较少的情况下最大限度地提高输出信噪比。然而,与全阵列系统相比,天线选择方案表现出较高的性能退化。在本文中,我们考虑了一种降维波束空间波束形成器,其中模拟移相器自适应合成正交波束的子集,然后在波束空间波束形成器中对其输出进行处理。我们研究了选择问题,以自适应地识别与实现几乎全波束空间性能最相关的波束,特别是在没有任何先验信息的广义情况下。仿真结果表明,波束选择具有复杂性优势,同时提高了天线选择的输出信噪比。
{"title":"Adaptive Reduced-Dimensional Beamspace Beamformer Design by Analogue Beam Selection","authors":"Xiangrong Wang, E. Aboutanios","doi":"10.1109/ICASSP.2019.8683360","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683360","url":null,"abstract":"Adaptive beamforming of large antenna arrays is difficult to implement due to prohibitively high hardware cost and computational complexity. An antenna selection strategy was utilized to maximize the output signal-to-interference-plus- noise ratio (SINR) with fewer antennas by optimizing array configurations. However, antenna selection scheme exhibits high degradation in performance compared to the full array system. In this paper, we consider a reduced-dimensional beamspace beamformer, where analogue phase shifters adaptively synthesize a subset of orthogonal beams whose outputs are then processed in a beamspace beamformer. We examine the selection problem to adaptively identify the beams most relevant to achieving almost the full beamspace performance, especially in the generalized case without any prior information. Simulation results demonstrated that the beam selection enjoys the complexity advantages, while simultaneously enhancing the output SINR of antenna selection.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"4350-4354"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88101650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Reflection Symmetry Detection by Embedding Symmetry in a Graph 在图中嵌入对称的反射对称检测
R. Nagar, S. Raman
Reflection symmetry is ubiquitous in nature and plays an important role in object detection and recognition tasks. Most of the existing methods for symmetry detection extract and describe each keypoint using a descriptor and a mirrored descriptor. Two keypoints are said to be mirror symmetric key-points if the original descriptor of one keypoint and the mirrored descriptor of the other keypoint are similar. However, these methods suffer from the following issue. The background pixels around the mirror symmetric pixels lying on the boundary of an object can be different. Therefore, their descriptors can be different. However, the boundary of a symmetric object is a major component of global reflection symmetry. We exploit the estimated boundary of the object and describe a boundary pixel using only the estimated normal of the boundary segment around the pixel. We embed the symmetry axes in a graph as cliques to robustly detect the symmetry axes. We show that this approach achieves state-of-the-art results in a standard dataset.
反射对称在自然界中普遍存在,在物体检测和识别任务中起着重要作用。现有的对称检测方法大多使用描述符和镜像描述符提取和描述每个关键点。如果一个关键点的原始描述符和另一个关键点的镜像描述符相似,则两个关键点被称为镜像对称关键点。然而,这些方法存在以下问题。位于物体边界上的镜像对称像素周围的背景像素可以不同。因此,它们的描述符可以是不同的。然而,对称物体的边界是全局反射对称的主要组成部分。我们利用物体的估计边界,并仅使用像素周围的边界段的估计法线来描述边界像素。我们将对称轴以团的形式嵌入图中,以鲁棒检测对称轴。我们表明,这种方法在标准数据集中实现了最先进的结果。
{"title":"Reflection Symmetry Detection by Embedding Symmetry in a Graph","authors":"R. Nagar, S. Raman","doi":"10.1109/ICASSP.2019.8682412","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682412","url":null,"abstract":"Reflection symmetry is ubiquitous in nature and plays an important role in object detection and recognition tasks. Most of the existing methods for symmetry detection extract and describe each keypoint using a descriptor and a mirrored descriptor. Two keypoints are said to be mirror symmetric key-points if the original descriptor of one keypoint and the mirrored descriptor of the other keypoint are similar. However, these methods suffer from the following issue. The background pixels around the mirror symmetric pixels lying on the boundary of an object can be different. Therefore, their descriptors can be different. However, the boundary of a symmetric object is a major component of global reflection symmetry. We exploit the estimated boundary of the object and describe a boundary pixel using only the estimated normal of the boundary segment around the pixel. We embed the symmetry axes in a graph as cliques to robustly detect the symmetry axes. We show that this approach achieves state-of-the-art results in a standard dataset.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"4012 2 1","pages":"2147-2151"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86699508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Asymptotically Optimal Recovery of Gaussian Sources from Noisy Stationary Mixtures: the Least-noisy Maximally-separating Solution 高斯源在有噪声平稳混合中的渐近最优恢复:最小噪声最大分离解
A. Weiss, A. Yeredor
We address the problem of source separation from noisy mixtures in a semi-blind scenario, with stationary, temporally-diverse Gaussian sources and known spectra. In such noisy models, a dilemma arises regarding the desired objective. On one hand, a "maximally separating" solution, providing the minimal attainable Interference-to-Source-Ratio (ISR), would often suffer from significant residual noise. On the other hand, optimal Minimum Mean Square Error (MMSE) estimation would yield estimates which are the "least distorted" versions of the true sources, often at the cost of compromised ISR. Based on Maximum Likelihood (ML) estimation of the unknown underlying model parameters, we propose two ML-based estimates of the sources. One asymptotically coincides with the MMSE estimate of the sources, whereas the other asymptotically coincides with the (unbiased) "least-noisy maximally-separating" solution for this model. We prove the asymptotic optimality of the latter and present the corresponding Cramér-Rao lower bound. We discuss the differences in principal properties of the proposed estimates and demonstrate them empirically using simulation results.
我们解决了在半盲情况下从噪声混合物中分离源的问题,该情况具有平稳的、时间变化的高斯源和已知的光谱。在这种嘈杂的模型中,出现了一个关于期望目标的困境。一方面,提供最小干扰源比(ISR)的“最大分离”解决方案通常会受到明显的残余噪声的影响。另一方面,最佳最小均方误差(MMSE)估计将产生真实源的“最小失真”版本的估计,通常以折衷的ISR为代价。基于未知底层模型参数的最大似然(ML)估计,我们提出了两种基于ML的源估计。一个渐近地与源的MMSE估计一致,而另一个渐近地与(无偏)该模型的“最小噪声最大分离”解决方案。我们证明了后者的渐近最优性,并给出了相应的cram - rao下界。我们讨论了所提出的估计的主要性质的差异,并利用模拟结果实证地证明了它们。
{"title":"Asymptotically Optimal Recovery of Gaussian Sources from Noisy Stationary Mixtures: the Least-noisy Maximally-separating Solution","authors":"A. Weiss, A. Yeredor","doi":"10.1109/ICASSP.2019.8682761","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682761","url":null,"abstract":"We address the problem of source separation from noisy mixtures in a semi-blind scenario, with stationary, temporally-diverse Gaussian sources and known spectra. In such noisy models, a dilemma arises regarding the desired objective. On one hand, a \"maximally separating\" solution, providing the minimal attainable Interference-to-Source-Ratio (ISR), would often suffer from significant residual noise. On the other hand, optimal Minimum Mean Square Error (MMSE) estimation would yield estimates which are the \"least distorted\" versions of the true sources, often at the cost of compromised ISR. Based on Maximum Likelihood (ML) estimation of the unknown underlying model parameters, we propose two ML-based estimates of the sources. One asymptotically coincides with the MMSE estimate of the sources, whereas the other asymptotically coincides with the (unbiased) \"least-noisy maximally-separating\" solution for this model. We prove the asymptotic optimality of the latter and present the corresponding Cramér-Rao lower bound. We discuss the differences in principal properties of the proposed estimates and demonstrate them empirically using simulation results.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"46 1","pages":"5466-5470"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85470901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evolutionary Subspace Clustering: Discovering Structure in Self-expressive Time-series Data 演化子空间聚类:发现自表达时间序列数据的结构
Abolfazl Hashemi, H. Vikalo
An evolutionary self-expressive model for clustering a collection of evolving data points that lie on a union of low-dimensional evolving subspaces is proposed. A parsimonious representation of data points at each time step is learned via a non-convex optimization framework that exploits the self-expressiveness property of the evolving data while taking into account data representation from the preceding time step. The resulting scheme adaptively learns an innovation matrix that captures changes in self-representation of data in consecutive time steps as well as a smoothing parameter reflective of the rate of data evolution. Extensive experiments demonstrate superiority of the proposed framework overs state-of-the-art static subspace clustering algorithms and existing evolutionary clustering schemes.
提出了一种进化自表达模型,用于聚类位于低维进化子空间并集上的进化数据点集合。每个时间步的数据点的精简表示是通过一个非凸优化框架学习的,该框架利用了进化数据的自表达特性,同时考虑了前一个时间步的数据表示。所得到的方案自适应地学习了一个创新矩阵,该矩阵捕获了连续时间步长的数据自表示的变化,以及反映数据演化速度的平滑参数。大量的实验证明了该框架优于最先进的静态子空间聚类算法和现有的进化聚类方案。
{"title":"Evolutionary Subspace Clustering: Discovering Structure in Self-expressive Time-series Data","authors":"Abolfazl Hashemi, H. Vikalo","doi":"10.1109/ICASSP.2019.8682405","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682405","url":null,"abstract":"An evolutionary self-expressive model for clustering a collection of evolving data points that lie on a union of low-dimensional evolving subspaces is proposed. A parsimonious representation of data points at each time step is learned via a non-convex optimization framework that exploits the self-expressiveness property of the evolving data while taking into account data representation from the preceding time step. The resulting scheme adaptively learns an innovation matrix that captures changes in self-representation of data in consecutive time steps as well as a smoothing parameter reflective of the rate of data evolution. Extensive experiments demonstrate superiority of the proposed framework overs state-of-the-art static subspace clustering algorithms and existing evolutionary clustering schemes.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"73 1","pages":"3707-3711"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85715959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Bayesian and Gaussian Process Neural Networks for Large Vocabulary Continuous Speech Recognition 大词汇量连续语音识别的贝叶斯和高斯过程神经网络
Shoukang Hu, Max W. Y. Lam, Xurong Xie, Shansong Liu, Jianwei Yu, Xixin Wu, Xunying Liu, H. Meng
The hidden activation functions inside deep neural networks (DNNs) play a vital role in learning high level discriminative features and controlling the information flows to track longer history. However, the fixed model parameters used in standard DNNs can lead to over-fitting and poor generalization when given limited training data. Furthermore, the precise forms of activations used in DNNs are often manually set at a global level for all hidden nodes, thus lacking an automatic selection method. In order to address these issues, Bayesian neural networks (BNNs) acoustic models are proposed in this paper to explicitly model the uncertainty associated with DNN parameters. Gaussian Process (GP) activations based DNN and LSTM acoustic models are also used in this paper to allow the optimal forms of hidden activations to be stochastically learned for individual hidden nodes. An efficient variational inference based training algorithm is derived for BNN, GPNN and GPLSTM systems. Experiments were conducted on a LVCSR system trained on a 75 hour subset of Switchboard I data. The best BNN and GPNN systems outperformed both the baseline DNN systems constructed using fixed form activations and their combination via frame level joint decoding by 1% absolute in word error rate.
深层神经网络(dnn)内部隐藏的激活函数在学习高级判别特征和控制信息流以跟踪更长的历史中起着至关重要的作用。然而,当给定有限的训练数据时,标准深度神经网络中使用的固定模型参数可能导致过度拟合和泛化不良。此外,dnn中使用的精确激活形式通常是在全局水平上为所有隐藏节点手动设置的,因此缺乏自动选择方法。为了解决这些问题,本文提出了贝叶斯神经网络(BNNs)声学模型来明确地模拟与DNN参数相关的不确定性。本文还使用了基于高斯过程(GP)激活的DNN和LSTM声学模型,以允许对单个隐藏节点随机学习隐藏激活的最佳形式。针对BNN、GPNN和GPLSTM系统,提出了一种高效的基于变分推理的训练算法。实验是在LVCSR系统上进行的,该系统接受了75小时的交换机I数据子集的训练。最好的BNN和GPNN系统比使用固定形式激活构建的基线DNN系统和通过帧级联合解码构建的基线DNN系统在单词错误率上高出1%。
{"title":"Bayesian and Gaussian Process Neural Networks for Large Vocabulary Continuous Speech Recognition","authors":"Shoukang Hu, Max W. Y. Lam, Xurong Xie, Shansong Liu, Jianwei Yu, Xixin Wu, Xunying Liu, H. Meng","doi":"10.1109/ICASSP.2019.8682487","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682487","url":null,"abstract":"The hidden activation functions inside deep neural networks (DNNs) play a vital role in learning high level discriminative features and controlling the information flows to track longer history. However, the fixed model parameters used in standard DNNs can lead to over-fitting and poor generalization when given limited training data. Furthermore, the precise forms of activations used in DNNs are often manually set at a global level for all hidden nodes, thus lacking an automatic selection method. In order to address these issues, Bayesian neural networks (BNNs) acoustic models are proposed in this paper to explicitly model the uncertainty associated with DNN parameters. Gaussian Process (GP) activations based DNN and LSTM acoustic models are also used in this paper to allow the optimal forms of hidden activations to be stochastically learned for individual hidden nodes. An efficient variational inference based training algorithm is derived for BNN, GPNN and GPLSTM systems. Experiments were conducted on a LVCSR system trained on a 75 hour subset of Switchboard I data. The best BNN and GPNN systems outperformed both the baseline DNN systems constructed using fixed form activations and their combination via frame level joint decoding by 1% absolute in word error rate.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"9 1","pages":"6555-6559"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85768922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Neural Variational Identification and Filtering for Stochastic Non-linear Dynamical Systems with Application to Non-intrusive Load Monitoring 随机非线性动力系统的神经变分辨识与滤波及其在非侵入式负荷监测中的应用
Henning Lange, M. Berges, J. Z. Kolter
In this paper, an algorithm for performing System Identification and inference of the filtering recursion for stochastic non-linear dynamical systems is introduced. Additionally, the algorithm allows for enforcing domain-constraints of the state variable. The algorithm makes use of an approximate inference technique called Variational Inference in conjunction with Deep Neural Networks as the optimization engine. Although general in its nature, the algorithm is evaluated in the context of Non-Intrusive Load Monitoring, the problem of inferring the operational state of individual electrical appliances given aggregate measurements of electrical power collected in a home.
本文介绍了一种对随机非线性动力系统进行系统辨识和滤波递推推理的算法。此外,该算法允许执行状态变量的域约束。该算法利用一种称为变分推理的近似推理技术,并结合深度神经网络作为优化引擎。虽然其本质是通用的,但该算法是在非侵入式负载监测的背景下进行评估的,该问题是在给定家庭中收集的总电力测量值的情况下推断单个电器的运行状态。
{"title":"Neural Variational Identification and Filtering for Stochastic Non-linear Dynamical Systems with Application to Non-intrusive Load Monitoring","authors":"Henning Lange, M. Berges, J. Z. Kolter","doi":"10.1109/ICASSP.2019.8683552","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683552","url":null,"abstract":"In this paper, an algorithm for performing System Identification and inference of the filtering recursion for stochastic non-linear dynamical systems is introduced. Additionally, the algorithm allows for enforcing domain-constraints of the state variable. The algorithm makes use of an approximate inference technique called Variational Inference in conjunction with Deep Neural Networks as the optimization engine. Although general in its nature, the algorithm is evaluated in the context of Non-Intrusive Load Monitoring, the problem of inferring the operational state of individual electrical appliances given aggregate measurements of electrical power collected in a home.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"8340-8344"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82402365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Surgical Activities Recognition Using Multi-scale Recurrent Networks 基于多尺度递归网络的手术活动识别
Ilker Gurcan, H. Nguyen
Recently, surgical activity recognition has been receiving significant attention from the medical imaging community. Existing state-of-the-art approaches employ recurrent neural networks such as long-short term memory networks (LSTMs). However, our experiments show that these networks are not effective in capturing the relationship of features with different temporal scales. Such limitation will lead to sub-optimal recognition performance of surgical activities containing complex motions at multiple time scales. To overcome this shortcoming, our paper proposes a multi-scale recurrent neural network (MS-RNN) that combines the strength of both wavelet scattering operations and LSTM. We validate the effectiveness of the proposed network using both real and synthetic datasets. Our experimental results show that MS-RNN outperforms state-of-the-art methods in surgical activity recognition by a significant margin. On a synthetic dataset, the proposed network achieves more than 90% classification accuracy while LSTM’s accuracy is around chance level. Experiments on real surgical activity dataset shows a significant improvement of recognition accuracy over the current state of the art (90.2% versus 83.3%).
近年来,外科手术活动识别一直受到医学影像界的极大关注。现有的最先进的方法采用循环神经网络,如长短期记忆网络(LSTMs)。然而,我们的实验表明,这些网络不能有效地捕获不同时间尺度的特征之间的关系。这种限制将导致在多个时间尺度上对包含复杂运动的手术活动的次优识别性能。为了克服这一缺点,本文提出了一种结合小波散射运算和LSTM的多尺度递归神经网络(MS-RNN)。我们使用真实数据集和合成数据集验证了所提出网络的有效性。我们的实验结果表明,MS-RNN在手术活动识别方面明显优于最先进的方法。在合成数据集上,该网络的分类准确率达到90%以上,而LSTM的分类准确率在机会水平左右。在真实手术活动数据集上的实验表明,与目前的技术水平相比,识别准确率有了显著提高(90.2%对83.3%)。
{"title":"Surgical Activities Recognition Using Multi-scale Recurrent Networks","authors":"Ilker Gurcan, H. Nguyen","doi":"10.1109/ICASSP.2019.8683849","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683849","url":null,"abstract":"Recently, surgical activity recognition has been receiving significant attention from the medical imaging community. Existing state-of-the-art approaches employ recurrent neural networks such as long-short term memory networks (LSTMs). However, our experiments show that these networks are not effective in capturing the relationship of features with different temporal scales. Such limitation will lead to sub-optimal recognition performance of surgical activities containing complex motions at multiple time scales. To overcome this shortcoming, our paper proposes a multi-scale recurrent neural network (MS-RNN) that combines the strength of both wavelet scattering operations and LSTM. We validate the effectiveness of the proposed network using both real and synthetic datasets. Our experimental results show that MS-RNN outperforms state-of-the-art methods in surgical activity recognition by a significant margin. On a synthetic dataset, the proposed network achieves more than 90% classification accuracy while LSTM’s accuracy is around chance level. Experiments on real surgical activity dataset shows a significant improvement of recognition accuracy over the current state of the art (90.2% versus 83.3%).","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"13 1","pages":"2887-2891"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82478817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Multi-step Self-attention Network for Cross-modal Retrieval Based on a Limited Text Space 基于有限文本空间的跨模态检索多步自关注网络
Zheng Yu, Wenmin Wang, Ge Li
Cross-modal retrieval has been recently proposed to find an appropriate subspace where the similarity among different modalities, such as image and text, can be directly measured. In this paper, we propose Multi-step Self-Attention Network (MSAN) to perform cross-modal retrieval in a limited text space with multiple attention steps, that can selectively attend to partial shared information at each step and aggregate useful information over multiple steps to measure the final similarity. In order to achieve better retrieval results with faster training speed, we introduce global prior knowledge as the global reference information. Extensive experiments on Flickr30K and MSCOCO, show that MSAN achieves new state-of-the-art results in accuracy for cross-modal retrieval.
跨模态检索最近被提出,用来寻找一个合适的子空间来直接测量不同模态(如图像和文本)之间的相似性。在本文中,我们提出了多步自注意网络(Multi-step Self-Attention Network, MSAN),在有限的文本空间中使用多个注意步骤进行跨模态检索,该网络可以在每一步选择性地关注部分共享信息,并在多个步骤中聚合有用信息以度量最终的相似性。为了以更快的训练速度获得更好的检索结果,我们引入全局先验知识作为全局参考信息。在Flickr30K和MSCOCO上进行的大量实验表明,MSAN在跨模态检索的准确性方面取得了新的最先进的结果。
{"title":"Multi-step Self-attention Network for Cross-modal Retrieval Based on a Limited Text Space","authors":"Zheng Yu, Wenmin Wang, Ge Li","doi":"10.1109/ICASSP.2019.8682424","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682424","url":null,"abstract":"Cross-modal retrieval has been recently proposed to find an appropriate subspace where the similarity among different modalities, such as image and text, can be directly measured. In this paper, we propose Multi-step Self-Attention Network (MSAN) to perform cross-modal retrieval in a limited text space with multiple attention steps, that can selectively attend to partial shared information at each step and aggregate useful information over multiple steps to measure the final similarity. In order to achieve better retrieval results with faster training speed, we introduce global prior knowledge as the global reference information. Extensive experiments on Flickr30K and MSCOCO, show that MSAN achieves new state-of-the-art results in accuracy for cross-modal retrieval.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"16 1","pages":"2082-2086"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82559134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dilated Residual Network with Multi-head Self-attention for Speech Emotion Recognition 基于多头自注意的扩展残差网络语音情绪识别
Runnan Li, Zhiyong Wu, Jia Jia, Sheng Zhao, H. Meng
Speech emotion recognition (SER) plays an important role in intelligent speech interaction. One vital challenge in SER is to extract emotion-relevant features from speech signals. In state-of-the-art SER techniques, deep learning methods, e.g, Convolutional Neural Networks (CNNs), are widely employed for feature learning and have achieved significant performance. However, in the CNN-oriented methods, two performance limitations have raised: 1) the loss of temporal structure of speech in the progressive resolution reduction; 2) the ignoring of relative dependencies between elements in suprasegmental feature sequence. In this paper, we proposed the combining use of Dilated Residual Network (DRN) and Multi-head Self-attention to alleviate the above limitations. By employing DRN, the network can retain high resolution of temporal structure in feature learning, with similar size of receptive field to CNN based approach. By employing Multi-head Self-attention, the network can model the inner dependencies between elements with different positions in the learned suprasegmental feature sequence, which enhances the importing of emotion-salient information. Experiments on emotional benchmarking dataset IEMOCAP have demonstrated the effectiveness of the proposed framework, with 11.7% to 18.6% relative improvement to state-of-the-art approaches.
语音情感识别在智能语音交互中起着重要的作用。从语音信号中提取情感相关特征是语音识别的一个重要挑战。在最先进的SER技术中,深度学习方法,例如卷积神经网络(cnn),被广泛用于特征学习并取得了显着的性能。然而,在面向cnn的方法中,提出了两个性能限制:1)在逐级分辨率降低中语音时间结构的丢失;2)忽略了超分段特征序列中元素之间的相对依赖关系。本文提出了扩展残差网络(DRN)和多头自注意相结合的方法来缓解上述局限性。通过使用DRN,网络可以在特征学习中保持较高的时间结构分辨率,并且接收野的大小与基于CNN的方法相似。该网络利用多头自注意对学习到的超切分特征序列中不同位置元素之间的内在依赖关系进行建模,增强了情绪显著性信息的导入。在情感基准测试数据集IEMOCAP上的实验证明了所提出框架的有效性,相对于最先进的方法,该框架的相对改进幅度为11.7%至18.6%。
{"title":"Dilated Residual Network with Multi-head Self-attention for Speech Emotion Recognition","authors":"Runnan Li, Zhiyong Wu, Jia Jia, Sheng Zhao, H. Meng","doi":"10.1109/ICASSP.2019.8682154","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682154","url":null,"abstract":"Speech emotion recognition (SER) plays an important role in intelligent speech interaction. One vital challenge in SER is to extract emotion-relevant features from speech signals. In state-of-the-art SER techniques, deep learning methods, e.g, Convolutional Neural Networks (CNNs), are widely employed for feature learning and have achieved significant performance. However, in the CNN-oriented methods, two performance limitations have raised: 1) the loss of temporal structure of speech in the progressive resolution reduction; 2) the ignoring of relative dependencies between elements in suprasegmental feature sequence. In this paper, we proposed the combining use of Dilated Residual Network (DRN) and Multi-head Self-attention to alleviate the above limitations. By employing DRN, the network can retain high resolution of temporal structure in feature learning, with similar size of receptive field to CNN based approach. By employing Multi-head Self-attention, the network can model the inner dependencies between elements with different positions in the learned suprasegmental feature sequence, which enhances the importing of emotion-salient information. Experiments on emotional benchmarking dataset IEMOCAP have demonstrated the effectiveness of the proposed framework, with 11.7% to 18.6% relative improvement to state-of-the-art approaches.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"80 1 1","pages":"6675-6679"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89560647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
Baseline Wander Removal and Isoelectric Correction in Electrocardiograms Using Clustering 基于聚类的心电图基线漂移去除和等电校正
Kjell Le, T. Eftestøl, K. Engan, Ø. Kleiven, S. Ørn
Baseline wander is a low frequency noise which is often removed by a highpass filter in electrocardiogram signals. However, this might not be sufficient to correct the isoelectric level of the signal, there exist an isoelectric bias. The isoelectric level is used as a reference point for amplitude measurements, and is recommended to have this point at 0 V, i.e. isoelectric adjusted. To correct the isoelectric level a clustering method is proposed to determine the isoelectric bias, which is thereafter subtracted from a signal averaged template. Calculation of the mean electrical axis (MEA) is used to evaluate the iso-electric correction. The MEA can be estimated from any lead pairs in the frontal plane, and a low variance in the estimates over the different lead pairs would suggest that the calculation of the MEA in each lead pair are consistent. Different methods are evaluated for calculating MEA, and the variance in the results as well as other measures, favour the proposed isoelectric adjusted signals in all MEA methods.
基线漂移是一种低频噪声,在心电图信号中常被高通滤波器去除。然而,这可能不足以纠正信号的等电电平,存在等电偏置。等电电平用作幅度测量的参考点,建议将该点置于0 V,即等电调节。为了校正等电水平,提出了一种聚类方法来确定等电偏差,然后从信号平均模板中减去等电偏差。用平均电轴(MEA)的计算来评价等电校正。可以从锋面上的任何导联对估计MEA,不同导联对估计的低方差表明每个导联对的MEA计算是一致的。对不同的MEA计算方法进行了评价,结果的方差和其他度量都有利于所提出的等电调整信号。
{"title":"Baseline Wander Removal and Isoelectric Correction in Electrocardiograms Using Clustering","authors":"Kjell Le, T. Eftestøl, K. Engan, Ø. Kleiven, S. Ørn","doi":"10.1109/ICASSP.2019.8683084","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683084","url":null,"abstract":"Baseline wander is a low frequency noise which is often removed by a highpass filter in electrocardiogram signals. However, this might not be sufficient to correct the isoelectric level of the signal, there exist an isoelectric bias. The isoelectric level is used as a reference point for amplitude measurements, and is recommended to have this point at 0 V, i.e. isoelectric adjusted. To correct the isoelectric level a clustering method is proposed to determine the isoelectric bias, which is thereafter subtracted from a signal averaged template. Calculation of the mean electrical axis (MEA) is used to evaluate the iso-electric correction. The MEA can be estimated from any lead pairs in the frontal plane, and a low variance in the estimates over the different lead pairs would suggest that the calculation of the MEA in each lead pair are consistent. Different methods are evaluated for calculating MEA, and the variance in the results as well as other measures, favour the proposed isoelectric adjusted signals in all MEA methods.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"1274-1278"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90022306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1