首页 > 最新文献

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Self-stabilized deep neural network 自稳定深度神经网络
Pegah Ghahremani, J. Droppo
Deep neural network models have been successfully applied to many tasks such as image labeling and speech recognition. Mini-batch stochastic gradient descent is the most prevalent method for training these models. A critical part of successfully applying this method is choosing appropriate initial values, as well as local and global learning rate scheduling algorithms. In this paper, we present a method which is less sensitive to choice of initial values, works better than popular learning rate adjustment algorithms, and speeds convergence on model parameters. We show that using the Self-stabilized DNN method, we no longer require initial learning rate tuning and training converges quickly with a fixed global learning rate. The proposed method provides promising results over conventional DNN structure with better convergence rate.
深度神经网络模型已经成功地应用于许多任务,如图像标记和语音识别。小批量随机梯度下降法是训练这些模型最常用的方法。成功应用该方法的关键是选择合适的初始值,以及局部和全局学习率调度算法。在本文中,我们提出了一种对初始值的选择不太敏感的方法,比常用的学习率调整算法效果更好,并且加快了模型参数的收敛速度。我们表明,使用自稳定DNN方法,我们不再需要初始学习率调整,并且训练以固定的全局学习率快速收敛。与传统的深度神经网络结构相比,该方法具有更好的收敛速度。
{"title":"Self-stabilized deep neural network","authors":"Pegah Ghahremani, J. Droppo","doi":"10.1109/ICASSP.2016.7472719","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472719","url":null,"abstract":"Deep neural network models have been successfully applied to many tasks such as image labeling and speech recognition. Mini-batch stochastic gradient descent is the most prevalent method for training these models. A critical part of successfully applying this method is choosing appropriate initial values, as well as local and global learning rate scheduling algorithms. In this paper, we present a method which is less sensitive to choice of initial values, works better than popular learning rate adjustment algorithms, and speeds convergence on model parameters. We show that using the Self-stabilized DNN method, we no longer require initial learning rate tuning and training converges quickly with a fixed global learning rate. The proposed method provides promising results over conventional DNN structure with better convergence rate.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122112313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
An acoustic keystroke transient canceler for speech communication terminals using a semi-blind adaptive filter model 基于半盲自适应滤波模型的语音通信终端声击键瞬态消除器
H. Buchner, J. Skoglund, S. Godsill
In many teleconferencing applications using modern laptop and net-book devices it is common to encounter annoying keyboard typing noise. In this paper we propose an acoustic keystroke transient canceler for speech communication terminals as a novel broadband adaptive filter application in such a hands-free scenario. We present this approach in the context of the Google Chromebook Pixel device which is equipped with a special audio reference channel providing various new signal processing possibilities. Our novel semi-blind/semi-supervised approach exploiting this new degree of freedom, combined with the system-based broadband estimation and a novel adaptation control yields a high-quality speech enhancement even under challenging acoustic conditions.
在许多使用现代笔记本电脑和上网本设备的电话会议应用中,经常会遇到恼人的键盘输入噪音。本文提出了一种用于语音通信终端的声学击键瞬态消除器,作为一种新的宽带自适应滤波器应用于这种免提场景。我们在谷歌Chromebook Pixel设备的背景下提出了这种方法,该设备配备了一个特殊的音频参考通道,提供了各种新的信号处理可能性。我们新颖的半盲/半监督方法利用这种新的自由度,结合基于系统的宽带估计和新颖的自适应控制,即使在具有挑战性的声学条件下也能产生高质量的语音增强。
{"title":"An acoustic keystroke transient canceler for speech communication terminals using a semi-blind adaptive filter model","authors":"H. Buchner, J. Skoglund, S. Godsill","doi":"10.1109/ICASSP.2016.7471748","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7471748","url":null,"abstract":"In many teleconferencing applications using modern laptop and net-book devices it is common to encounter annoying keyboard typing noise. In this paper we propose an acoustic keystroke transient canceler for speech communication terminals as a novel broadband adaptive filter application in such a hands-free scenario. We present this approach in the context of the Google Chromebook Pixel device which is equipped with a special audio reference channel providing various new signal processing possibilities. Our novel semi-blind/semi-supervised approach exploiting this new degree of freedom, combined with the system-based broadband estimation and a novel adaptation control yields a high-quality speech enhancement even under challenging acoustic conditions.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129245874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Data sketching for large-scale Kalman filtering 大规模卡尔曼滤波的数据草图
Dimitris Berberidis, G. Giannakis
In an age of exponentially increasing data generation, performing inference tasks by utilizing the available information in its entirety is not always an affordable option. The present paper puts forth approaches to render tracking of large-scale dynamic processes affordable, by processing a reduced number of data. Two distinct methods are introduced for reducing the number of data involved per time step. The first method builds on reduction using low-complexity random projections, while the second performs censoring for data-adaptive measurement selection. Simulations on synthetic data, compare the proposed methods with competing alternatives, and corroborate their efficacy in terms of estimation accuracy over complexity reduction.
在数据生成呈指数级增长的时代,通过利用全部可用信息来执行推理任务并不总是一个负担得起的选择。本文提出了通过处理减少的数据数量来实现大规模动态过程跟踪的方法。介绍了两种不同的方法来减少每个时间步所涉及的数据数量。第一种方法建立在使用低复杂度随机投影的约简基础上,而第二种方法对数据自适应测量选择进行审查。在合成数据上进行仿真,比较了所提出的方法与竞争方案,并证实了它们在估计精度和降低复杂性方面的有效性。
{"title":"Data sketching for large-scale Kalman filtering","authors":"Dimitris Berberidis, G. Giannakis","doi":"10.1109/TSP.2017.2691662","DOIUrl":"https://doi.org/10.1109/TSP.2017.2691662","url":null,"abstract":"In an age of exponentially increasing data generation, performing inference tasks by utilizing the available information in its entirety is not always an affordable option. The present paper puts forth approaches to render tracking of large-scale dynamic processes affordable, by processing a reduced number of data. Two distinct methods are introduced for reducing the number of data involved per time step. The first method builds on reduction using low-complexity random projections, while the second performs censoring for data-adaptive measurement selection. Simulations on synthetic data, compare the proposed methods with competing alternatives, and corroborate their efficacy in terms of estimation accuracy over complexity reduction.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128729440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Improved decoding of analog modulo block codes for noise mitigation 改进了模拟模分组码的解码以降低噪声
Tim Schmitz, P. Jax, P. Vary
A drawback of digital transmission of analog signals is the unavoidable quantization error which leads to a limited quality even for good channel conditions. This saturation can be avoided by using analog transmission systems with discrete-time and quasi-continuous-amplitude encoding and decoding, e.g., Analog Modulo Block codes (AMB codes). The AMB code vectors are produced by multiplying a real-valued information vector with a real-valued generator matrix using a modulo arithmetic. Here, algorithms for improving the decoding performance are presented. The Lattice Maximum Likelihood (LML) decoder, a variant of the Discrete Maximum Likelihood (DML) decoder, is derived and analyzed. It refines the Zero Forcing (ZF) result if necessary, thus achieving near-ML signal quality with a reduced decoding complexity. A reduced complexity is essential for decoding high-dimensional code words. Additionally, pre- and post-processing methods are presented and analyzed, which increase the signal-to-distortion ratio (SDR) of the received symbols.
模拟信号的数字传输的一个缺点是不可避免的量化误差,即使在良好的信道条件下,也会导致质量有限。这种饱和可以通过使用具有离散时间和准连续幅度编码和解码的模拟传输系统来避免,例如模拟模组码(AMB码)。AMB编码向量是通过使用模运算将实值信息向量与实值生成器矩阵相乘产生的。本文提出了提高解码性能的算法。推导并分析了离散最大似然解码器的一种变体——晶格最大似然解码器(LML)。它细化零强制(ZF)结果,如有必要,从而实现近ml信号质量与降低解码复杂性。降低复杂度对高维码字的译码至关重要。此外,提出并分析了提高接收信号信失真比(SDR)的预处理和后处理方法。
{"title":"Improved decoding of analog modulo block codes for noise mitigation","authors":"Tim Schmitz, P. Jax, P. Vary","doi":"10.1109/ICASSP.2016.7472400","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472400","url":null,"abstract":"A drawback of digital transmission of analog signals is the unavoidable quantization error which leads to a limited quality even for good channel conditions. This saturation can be avoided by using analog transmission systems with discrete-time and quasi-continuous-amplitude encoding and decoding, e.g., Analog Modulo Block codes (AMB codes). The AMB code vectors are produced by multiplying a real-valued information vector with a real-valued generator matrix using a modulo arithmetic. Here, algorithms for improving the decoding performance are presented. The Lattice Maximum Likelihood (LML) decoder, a variant of the Discrete Maximum Likelihood (DML) decoder, is derived and analyzed. It refines the Zero Forcing (ZF) result if necessary, thus achieving near-ML signal quality with a reduced decoding complexity. A reduced complexity is essential for decoding high-dimensional code words. Additionally, pre- and post-processing methods are presented and analyzed, which increase the signal-to-distortion ratio (SDR) of the received symbols.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130872105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Functional connectivity brain network analysis through network to signal transform based on the resistance distance 功能连通性脑网络分析通过网络到信号变换的基础上的阻力距离
Marisel Villafañe-Delgado, Selin Aviyente
Functional connectivity brain networks have been shown to demonstrate interesting complex network behavior such as small-worldness. Transforming networks to time series has provided an alternative way of characterizing the structure of complex networks. However, previously proposed deterministic methods are limited to unweighted graphs. In this paper, we propose to employ the resistance distance matrix of weighted graphs as the distance matrix for transforming networks to signals based on classical multidimensional scaling. We present a framework for obtaining information about the network's structure through the mapped signals and recovering the original network using properties of the resistance matrix. Finally, the proposed method is applied to characterizing functional connectivity networks constructed from electroencephalogram data.
功能性连接脑网络已被证明显示出有趣的复杂网络行为,如小世界。将网络转换为时间序列提供了表征复杂网络结构的另一种方法。然而,先前提出的确定性方法仅限于未加权的图。本文提出利用加权图的阻力距离矩阵作为基于经典多维标度的网络信号转换的距离矩阵。我们提出了一个框架,通过映射信号获取网络的结构信息,并利用电阻矩阵的性质恢复原始网络。最后,将该方法应用于脑电图数据构建的功能连接网络的表征。
{"title":"Functional connectivity brain network analysis through network to signal transform based on the resistance distance","authors":"Marisel Villafañe-Delgado, Selin Aviyente","doi":"10.1109/ICASSP.2016.7471766","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7471766","url":null,"abstract":"Functional connectivity brain networks have been shown to demonstrate interesting complex network behavior such as small-worldness. Transforming networks to time series has provided an alternative way of characterizing the structure of complex networks. However, previously proposed deterministic methods are limited to unweighted graphs. In this paper, we propose to employ the resistance distance matrix of weighted graphs as the distance matrix for transforming networks to signals based on classical multidimensional scaling. We present a framework for obtaining information about the network's structure through the mapped signals and recovering the original network using properties of the resistance matrix. Finally, the proposed method is applied to characterizing functional connectivity networks constructed from electroencephalogram data.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131751401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Universal encoding of multispectral images 多光谱图像的通用编码
D. Valsesia, P. Boufounos
We propose a new method for low-complexity compression of multispectral images. We develop on a novel approach to coding signals with side information based on recent advances in compressed sensing and universal scalar quantization. Our approach can be interpreted as a variation of quantized compressed sensing, where the most significant bits are discarded at the encoder and recovered at the decoder from the side information. The image is reconstructed using weighted total variation minimization, incorporating side information in the weights while enforcing consistency with the recovered quantized coefficient values. Our experiments validate our approach and confirm the improvements in rate-distortion performance.
提出了一种新的多光谱图像低复杂度压缩方法。基于压缩感知和通用标量量化的最新进展,我们开发了一种具有侧信息的编码信号的新方法。我们的方法可以被解释为量化压缩感知的一种变化,其中最重要的比特在编码器处被丢弃,并在解码器处从侧信息中恢复。利用加权总变差最小化方法重建图像,在权重中加入侧信息,同时加强与恢复的量化系数值的一致性。我们的实验验证了我们的方法,并证实了速率失真性能的改善。
{"title":"Universal encoding of multispectral images","authors":"D. Valsesia, P. Boufounos","doi":"10.1109/ICASSP.2016.7472519","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472519","url":null,"abstract":"We propose a new method for low-complexity compression of multispectral images. We develop on a novel approach to coding signals with side information based on recent advances in compressed sensing and universal scalar quantization. Our approach can be interpreted as a variation of quantized compressed sensing, where the most significant bits are discarded at the encoder and recovered at the decoder from the side information. The image is reconstructed using weighted total variation minimization, incorporating side information in the weights while enforcing consistency with the recovered quantized coefficient values. Our experiments validate our approach and confirm the improvements in rate-distortion performance.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"258 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122740739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Multi-pair two-way AF relaying systems with massive arrays and imperfect CSI 具有大量阵列和不完善CSI的多对双向自动对焦中继系统
Chuili Kong, C. Zhong, M. Matthaiou, Emil Björnson, Zhaoyang Zhang
We consider a multi-pair two-way amplify-and-forward relaying system with a massive antenna array at the relay and estimated channel state information, assuming maximum-ratio combining/transmission processing. Closed-form approximations of the sum spectral efficiency are developed and simple analytical power scaling laws are presented, which reveal a fundamental trade-off between the transmit powers of each user/the relay and of each pilot symbol. Finally, the optimal power allocation problem is studied.
我们考虑一个多对双向放大和转发中继系统,在中继处有一个巨大的天线阵列和估计的信道状态信息,假设最大比率组合/传输处理。提出了和频谱效率的封闭近似,并给出了简单的解析功率标度定律,揭示了每个用户/中继和每个导频符号的发射功率之间的基本权衡。最后,研究了最优功率分配问题。
{"title":"Multi-pair two-way AF relaying systems with massive arrays and imperfect CSI","authors":"Chuili Kong, C. Zhong, M. Matthaiou, Emil Björnson, Zhaoyang Zhang","doi":"10.1109/ICASSP.2016.7472358","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472358","url":null,"abstract":"We consider a multi-pair two-way amplify-and-forward relaying system with a massive antenna array at the relay and estimated channel state information, assuming maximum-ratio combining/transmission processing. Closed-form approximations of the sum spectral efficiency are developed and simple analytical power scaling laws are presented, which reveal a fundamental trade-off between the transmit powers of each user/the relay and of each pilot symbol. Finally, the optimal power allocation problem is studied.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130720195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Block compressed sensing based distributed resource allocation for M2M communications 基于块压缩感知的M2M通信分布式资源分配
Yunyan Chang, P. Jung, Chan Zhou, S. Stańczak
In this paper, we utilize the framework of compressed sensing (CS) for device detection and distributed resource allocation in large-scale machine-to-machine (M2M) communication networks. The devices are partitioned into clusters according to some pre-defined criteria, e.g., proximity or service type. Moreover, by the sparse nature of the event occurrence in M2M communications, the activation pattern of the M2M devices can be formulated as a particular block sparse signal with additional in-block structure in CS based applications. This paper introduces a novel scheme for distributed resource allocation to the M2M devices based on block-CS related techniques, which mainly consists of three phases: (1) In a full-duplex acquisition phase, the network activation pattern is collected in a distributed manner. (2) The base station detects the active clusters and the number of active devices in each cluster, and then assigns a certain amount of resources accordingly. (3) Each active device detects the order of its index among all the active devices in the cluster and accesses the corresponding resource for transmission. The proposed scheme can efficiently reduce the acquisition time with much less computation complexity compared with standard CS algorithms. Finally, extensive simulations confirm the robustness of the proposed scheme under noisy conditions.
在本文中,我们利用压缩感知(CS)框架在大规模机器对机器(M2M)通信网络中进行设备检测和分布式资源分配。这些设备根据一些预先定义的标准(例如,距离或服务类型)划分为集群。此外,由于M2M通信中事件发生的稀疏性质,在基于CS的应用中,M2M设备的激活模式可以被表述为具有附加块内结构的特定块稀疏信号。本文提出了一种基于block-CS相关技术的M2M设备分布式资源分配方案,该方案主要包括三个阶段:(1)在全双工采集阶段,以分布式方式采集网络激活模式。(2)基站检测活动集群和每个集群中的活动设备数量,并据此分配一定数量的资源。(3)每台主用设备检测其索引在集群中所有主用设备中的顺序,并访问相应的资源进行传输。与标准的CS算法相比,该方案可以有效地缩短捕获时间,且计算复杂度大大降低。最后,大量的仿真验证了该方法在噪声条件下的鲁棒性。
{"title":"Block compressed sensing based distributed resource allocation for M2M communications","authors":"Yunyan Chang, P. Jung, Chan Zhou, S. Stańczak","doi":"10.1109/ICASSP.2016.7472386","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472386","url":null,"abstract":"In this paper, we utilize the framework of compressed sensing (CS) for device detection and distributed resource allocation in large-scale machine-to-machine (M2M) communication networks. The devices are partitioned into clusters according to some pre-defined criteria, e.g., proximity or service type. Moreover, by the sparse nature of the event occurrence in M2M communications, the activation pattern of the M2M devices can be formulated as a particular block sparse signal with additional in-block structure in CS based applications. This paper introduces a novel scheme for distributed resource allocation to the M2M devices based on block-CS related techniques, which mainly consists of three phases: (1) In a full-duplex acquisition phase, the network activation pattern is collected in a distributed manner. (2) The base station detects the active clusters and the number of active devices in each cluster, and then assigns a certain amount of resources accordingly. (3) Each active device detects the order of its index among all the active devices in the cluster and accesses the corresponding resource for transmission. The proposed scheme can efficiently reduce the acquisition time with much less computation complexity compared with standard CS algorithms. Finally, extensive simulations confirm the robustness of the proposed scheme under noisy conditions.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128995315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
An expectation-maximization eigenvector clustering approach to direction of arrival estimation of multiple speech sources 基于期望最大化特征向量聚类的多语音源到达方向估计
Xiong Xiao, Shengkui Zhao, Thi Ngoc Tho Nguyen, Douglas L. Jones, Chng Eng Siong, Haizhou Li
This paper presents an eigenvector clustering approach for estimating the direction of arrival (DOA) of multiple speech signals using a microphone array. Existing clustering approaches usually only use low frequencies to avoid spatial aliasing. In this study, we propose a probabilistic eigenvector clustering approach to use all frequencies. In our work, time-frequency (TF) bins dominated by only one source are first detected using a combination of noise-floor tracking, onset detection and coherence test. For each selected TF bin, the largest eigenvector of its spatial covariance matrix is extracted for clustering. A mixture density model is introduced to model the distribution of the eigenvectors, where each component distribution corresponds to one source and is parameterized by the source DOA. To use eigenvectors of all frequencies, the steering vectors of all frequencies of the sources are used in the distribution function. The DOAs of the sources can be estimated by maximizing the likelihood of the eigenvectors using an expectation-maximization (EM) algorithm. Simulation and experimental results show that the proposed approach significantly improves the root-mean-square error (RMSE) for DOA estimation of multiple speech sources compared to the MUSIC algorithm implemented on the single-source dominated TF bins and our previous clustering approach.
提出了一种基于麦克风阵列的多语音信号到达方向估计的特征向量聚类方法。现有的聚类方法通常只使用低频来避免空间混叠。在这项研究中,我们提出了一种概率特征向量聚类方法来使用所有频率。在我们的工作中,首先使用噪声本底跟踪、起始检测和相干性测试的组合来检测仅由一个源主导的时频(TF)箱。对于每个选定的TF bin,提取其空间协方差矩阵的最大特征向量进行聚类。引入混合密度模型来模拟特征向量的分布,其中每个分量分布对应于一个源,并由源的DOA参数化。为了使用所有频率的特征向量,在分布函数中使用源的所有频率的转向向量。源的doa可以通过使用期望最大化(EM)算法最大化特征向量的似然来估计。仿真和实验结果表明,与基于单源主导的TF bin的MUSIC算法和之前的聚类方法相比,该方法显著提高了多语音源DOA估计的均方根误差(RMSE)。
{"title":"An expectation-maximization eigenvector clustering approach to direction of arrival estimation of multiple speech sources","authors":"Xiong Xiao, Shengkui Zhao, Thi Ngoc Tho Nguyen, Douglas L. Jones, Chng Eng Siong, Haizhou Li","doi":"10.1109/ICASSP.2016.7472895","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472895","url":null,"abstract":"This paper presents an eigenvector clustering approach for estimating the direction of arrival (DOA) of multiple speech signals using a microphone array. Existing clustering approaches usually only use low frequencies to avoid spatial aliasing. In this study, we propose a probabilistic eigenvector clustering approach to use all frequencies. In our work, time-frequency (TF) bins dominated by only one source are first detected using a combination of noise-floor tracking, onset detection and coherence test. For each selected TF bin, the largest eigenvector of its spatial covariance matrix is extracted for clustering. A mixture density model is introduced to model the distribution of the eigenvectors, where each component distribution corresponds to one source and is parameterized by the source DOA. To use eigenvectors of all frequencies, the steering vectors of all frequencies of the sources are used in the distribution function. The DOAs of the sources can be estimated by maximizing the likelihood of the eigenvectors using an expectation-maximization (EM) algorithm. Simulation and experimental results show that the proposed approach significantly improves the root-mean-square error (RMSE) for DOA estimation of multiple speech sources compared to the MUSIC algorithm implemented on the single-source dominated TF bins and our previous clustering approach.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115146069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Learning to separate vocals from polyphonic mixtures via ensemble methods and structured output prediction 学习通过合奏方法和结构化输出预测从复调混合中分离声乐
Matt McVicar, Raúl Santos-Rodríguez, T. D. Bie
Separating the singing from a polyphonic mixed audio signal is a challenging but important task, with a wide range of applications across the music industry and music informatics research. Various methods have been devised over the years, ranging from Deep Learning approaches to dedicated ad hoc solutions. In this paper, we present a novel machine learning method for the task, using a Conditional Random Field (CRF) approach for structured output prediction. We exploit the diversity of previously proposed approaches by using their predictions as input features to our method - thus effectively developing an ensemble method. Our empirical results demonstrate the potential of integrating predictions from different previously-proposed methods into one ensemble method, and additionally show that CRF models with larger complexities generally lead to superior performance.
从复调混合音频信号中分离歌唱是一项具有挑战性但重要的任务,在音乐产业和音乐信息学研究中有着广泛的应用。多年来已经设计了各种方法,从深度学习方法到专用的临时解决方案。在本文中,我们提出了一种新的机器学习方法,使用条件随机场(CRF)方法进行结构化输出预测。我们利用以前提出的方法的多样性,将它们的预测作为我们方法的输入特征-从而有效地开发了一种集成方法。我们的实证结果证明了将不同先前提出的方法的预测整合到一个集成方法中的潜力,并且还表明复杂性较大的CRF模型通常会带来更好的性能。
{"title":"Learning to separate vocals from polyphonic mixtures via ensemble methods and structured output prediction","authors":"Matt McVicar, Raúl Santos-Rodríguez, T. D. Bie","doi":"10.1109/ICASSP.2016.7471715","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7471715","url":null,"abstract":"Separating the singing from a polyphonic mixed audio signal is a challenging but important task, with a wide range of applications across the music industry and music informatics research. Various methods have been devised over the years, ranging from Deep Learning approaches to dedicated ad hoc solutions. In this paper, we present a novel machine learning method for the task, using a Conditional Random Field (CRF) approach for structured output prediction. We exploit the diversity of previously proposed approaches by using their predictions as input features to our method - thus effectively developing an ensemble method. Our empirical results demonstrate the potential of integrating predictions from different previously-proposed methods into one ensemble method, and additionally show that CRF models with larger complexities generally lead to superior performance.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116622670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1