首页 > 最新文献

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Better beat tracking through robust onset aggregation 通过鲁棒起始聚合实现更好的心跳跟踪
Brian McFee, D. Ellis
Onset detection forms the critical first stage of most beat tracking algorithms. While common spectral-difference onset detectors can work well in genres with clear rhythmic structure, they can be sensitive to loud, asynchronous events (e.g., off-beat notes in a jazz solo), which limits their general efficacy. In this paper, we investigate methods to improve the robustness of onset detection for beat tracking. Experimental results indicate that simple modifications to onset detection can produce large improvements in beat tracking accuracy.
起始检测是大多数节拍跟踪算法的关键第一阶段。虽然普通的光谱差异开始检测器可以在具有清晰节奏结构的类型中工作得很好,但它们可能对大声的,异步的事件(例如,爵士独奏中的非节拍音符)敏感,这限制了它们的一般效力。在本文中,我们研究了提高起始检测的鲁棒性的方法。实验结果表明,对起跳检测进行简单的修改可以大大提高拍频跟踪的精度。
{"title":"Better beat tracking through robust onset aggregation","authors":"Brian McFee, D. Ellis","doi":"10.1109/ICASSP.2014.6853980","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853980","url":null,"abstract":"Onset detection forms the critical first stage of most beat tracking algorithms. While common spectral-difference onset detectors can work well in genres with clear rhythmic structure, they can be sensitive to loud, asynchronous events (e.g., off-beat notes in a jazz solo), which limits their general efficacy. In this paper, we investigate methods to improve the robustness of onset detection for beat tracking. Experimental results indicate that simple modifications to onset detection can produce large improvements in beat tracking accuracy.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"41 1","pages":"2154-2158"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81518052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Combination coefficients for fastest convergence of distributed LMS estimation 分布式LMS估计最快收敛的组合系数
K. Wagner, M. Doroslovački
Diffusion strategies for learning across networks which minimize the transient regime mean-square deviation across all nodes are presented. The problem of choosing combination coefficients which minimize the mean-square deviation at all given time instances results in a quadratic program with linear constraints. The implementation of the optimal procedure is based on the estimation of weight deviation vectors for which an algorithm is proposed. Additionally, the optimization that uses relaxed constraints is considered. The proposed methods were validated through simulations for different estimation distribution strategies and input signals. The results show a potential for significant improvement of the convergence speed.
提出了一种跨网络学习的扩散策略,使所有节点的暂态状态均方差最小。选取在所有给定时间实例中均方偏差最小的组合系数的问题是一个具有线性约束的二次规划问题。优化过程的实现是基于权重偏差向量的估计,并提出了一种算法。此外,还考虑了使用宽松约束的优化。通过不同估计分布策略和输入信号的仿真验证了所提方法的有效性。结果表明,收敛速度有显著提高的潜力。
{"title":"Combination coefficients for fastest convergence of distributed LMS estimation","authors":"K. Wagner, M. Doroslovački","doi":"10.1109/ICASSP.2014.6855001","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6855001","url":null,"abstract":"Diffusion strategies for learning across networks which minimize the transient regime mean-square deviation across all nodes are presented. The problem of choosing combination coefficients which minimize the mean-square deviation at all given time instances results in a quadratic program with linear constraints. The implementation of the optimal procedure is based on the estimation of weight deviation vectors for which an algorithm is proposed. Additionally, the optimization that uses relaxed constraints is considered. The proposed methods were validated through simulations for different estimation distribution strategies and input signals. The results show a potential for significant improvement of the convergence speed.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"21 1","pages":"7218-7222"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81555982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Iterative blind estimation of nonlinear channels 非线性信道的迭代盲估计
J. Dohl, G. Fettweis
Nonlinear distortions in analog frontends are becoming a growing problem which is not limited to power amplifiers. Modern modulation methods such as OFDM and next generation standards have high linearity requirements on all components in the signal path. A radio system that can tolerate a certain degree of nonlinear distortion without substantial loss of performance could enable high cost savings in development and production. In this paper we present a novel iterative blind estimator for nonlinear distortions. It complements existing mitigation algorithms by providing them with accurate estimates of the nonlinearity characteristic. It is shown that there is a negligible performance gap between perfect and estimated knowledge. The method is designed to be computationally inexpensive and can be readily implemented on today's digital signal processing systems.
模拟前端的非线性失真已成为一个日益突出的问题,它不仅局限于功率放大器。现代调制方法如OFDM和下一代标准对信号通路中的所有元件都有很高的线性要求。如果无线电系统能够承受一定程度的非线性失真,而不会造成实质性的性能损失,则可以在开发和生产中节省大量成本。本文提出了一种新的非线性失真的迭代盲估计方法。它通过提供对非线性特性的准确估计来补充现有的缓解算法。结果表明,完美知识和估计知识之间的性能差距可以忽略不计。该方法被设计为计算成本低廉,并且可以很容易地在当今的数字信号处理系统中实现。
{"title":"Iterative blind estimation of nonlinear channels","authors":"J. Dohl, G. Fettweis","doi":"10.1109/ICASSP.2014.6854337","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854337","url":null,"abstract":"Nonlinear distortions in analog frontends are becoming a growing problem which is not limited to power amplifiers. Modern modulation methods such as OFDM and next generation standards have high linearity requirements on all components in the signal path. A radio system that can tolerate a certain degree of nonlinear distortion without substantial loss of performance could enable high cost savings in development and production. In this paper we present a novel iterative blind estimator for nonlinear distortions. It complements existing mitigation algorithms by providing them with accurate estimates of the nonlinearity characteristic. It is shown that there is a negligible performance gap between perfect and estimated knowledge. The method is designed to be computationally inexpensive and can be readily implemented on today's digital signal processing systems.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"20 1","pages":"3923-3927"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81635821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Fast computation of the L1-principal component of real-valued data 实值数据l1主成分的快速计算
S. Kundu, Panos P. Markopoulos, D. Pados
Recently, Markopoulos et al. [1], [2] presented an optimal algorithm that computes the L1 maximum-projection principal component of any set of N real-valued data vectors of dimension D with complexity polynomial in N, O(ND). Still, moderate to high values of the data dimension D and/or data record size N may render the optimal algorithm unsuitable for practical implementation due to its exponential in D complexity. In this paper, we present for the first time in the literature a fast greedy single-bit-flipping conditionally optimal iterative algorithm for the computation of the L1 principal component with complexity O(N3). Detailed numerical studies are carried out demonstrating the effectiveness of the developed algorithm with applications to the general field of data dimensionality reduction and direction-of-arrival estimation.
最近,Markopoulos等[1],[2]提出了一种最优算法,计算任意N维的N个实值数据向量集的L1最大投影主成分,复杂度多项式为N, O(ND)。然而,数据维D和/或数据记录大小N的中高值可能会使最优算法不适合实际实现,因为它的D复杂度呈指数级增长。本文在文献中首次提出了一种快速贪婪单位翻转条件最优迭代算法,用于计算复杂度为O(N3)的L1主成分。详细的数值研究证明了所开发算法的有效性,并将其应用于数据降维和到达方向估计的一般领域。
{"title":"Fast computation of the L1-principal component of real-valued data","authors":"S. Kundu, Panos P. Markopoulos, D. Pados","doi":"10.1109/ICASSP.2014.6855164","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6855164","url":null,"abstract":"Recently, Markopoulos et al. [1], [2] presented an optimal algorithm that computes the L1 maximum-projection principal component of any set of N real-valued data vectors of dimension D with complexity polynomial in N, O(ND). Still, moderate to high values of the data dimension D and/or data record size N may render the optimal algorithm unsuitable for practical implementation due to its exponential in D complexity. In this paper, we present for the first time in the literature a fast greedy single-bit-flipping conditionally optimal iterative algorithm for the computation of the L1 principal component with complexity O(N3). Detailed numerical studies are carried out demonstrating the effectiveness of the developed algorithm with applications to the general field of data dimensionality reduction and direction-of-arrival estimation.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"18 1","pages":"8028-8032"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81845541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
CCA based feature selection with application to continuous depression recognition from acoustic speech features 基于CCA的特征选择及其在声学语音特征连续凹陷识别中的应用
Heysem Kaya, F. Eyben, A. A. Salah, Björn Schuller
In this study we make use of Canonical Correlation Analysis (CCA) based feature selection for continuous depression recognition from speech. Besides its common use in multi-modal/multi-view feature extraction, CCA can be easily employed as a feature selector. We introduce several novel ways of CCA based filter (ranking) methods, showing their relations to previous work. We test the suitability of proposed methods on the AVEC 2013 dataset under the ACM MM 2013 Challenge protocol. Using 17% of features, we obtained a relative improvement of 30% on the challenge's test-set baseline Root Mean Square Error.
在这项研究中,我们利用典型相关分析(CCA)为基础的特征选择,从语音连续抑郁症识别。除了通常用于多模态/多视图特征提取之外,CCA还可以很容易地用作特征选择器。我们介绍了几种基于CCA的过滤(排序)方法的新方法,并说明了它们与以往工作的关系。在ACM MM 2013挑战协议下,我们在AVEC 2013数据集上测试了所提出方法的适用性。使用17%的特征,我们在挑战的测试集基线均方根误差上获得了30%的相对改进。
{"title":"CCA based feature selection with application to continuous depression recognition from acoustic speech features","authors":"Heysem Kaya, F. Eyben, A. A. Salah, Björn Schuller","doi":"10.1109/ICASSP.2014.6854298","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854298","url":null,"abstract":"In this study we make use of Canonical Correlation Analysis (CCA) based feature selection for continuous depression recognition from speech. Besides its common use in multi-modal/multi-view feature extraction, CCA can be easily employed as a feature selector. We introduce several novel ways of CCA based filter (ranking) methods, showing their relations to previous work. We test the suitability of proposed methods on the AVEC 2013 dataset under the ACM MM 2013 Challenge protocol. Using 17% of features, we obtained a relative improvement of 30% on the challenge's test-set baseline Root Mean Square Error.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"13 1","pages":"3729-3733"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81859052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 64
Ergodic interference alignment for the SIMO/MIMO interference channel SIMO/MIMO干扰通道的遍历干扰对准
Yohan Lejosne, D. Slock, Y. Yuan-Wu
Ergodic interference alignment (IA) is a simple yet powerful tool that not only achieves the optimal K/2 degrees of freedom (DoF) of the K-user single-input single-output (SISO) interference channel (IC), but also allows each user to achieve at least half of its interference-free capacity at any SNR. By considering more general message sets, Nazer et al. also covered the MISO case. In this paper, we consider first the SIMO interference channel and extend ergodic IA techniques to this setting with Nr receive antennas. Our scheme achieves KNr/(Nr + 1), which is the DoF yielded by (standard) IA and is also the DoF of the channel when K > Nr. Moreover, this technique exhibits spatial scale invariance. By combining the existing MISO and the new SIMO results, we can also cover MIMO with Nt transmit antennas for the cases where either Nt/Nr or Nr/Nt is an integer R, yielding DoF =3D min(Nt, Nr)KR/(R + 1) which is optimal for K > R.
遍历干扰对准(IA)是一种简单但功能强大的工具,它不仅可以实现K-用户单输入单输出(SISO)干扰通道(IC)的最佳K/2自由度(DoF),而且还允许每个用户在任何信噪比下实现至少一半的无干扰容量。通过考虑更一般的消息集,Nazer等人也涵盖了MISO的情况。在本文中,我们首先考虑SIMO干扰信道,并将遍历IA技术扩展到具有Nr接收天线的这种设置。我们的方案实现了KNr/(Nr + 1),这是(标准)IA产生的DoF,也是K > Nr时信道的DoF。此外,该技术具有空间尺度不变性。通过结合现有的MISO和新的SIMO结果,我们还可以在Nt/Nr或Nr/Nt为整数R的情况下覆盖Nt发射天线的MIMO,从而得到DoF =3D min(Nt, Nr)KR/(R + 1),这是K > R的最佳选择。
{"title":"Ergodic interference alignment for the SIMO/MIMO interference channel","authors":"Yohan Lejosne, D. Slock, Y. Yuan-Wu","doi":"10.1109/ICASSP.2014.6854790","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854790","url":null,"abstract":"Ergodic interference alignment (IA) is a simple yet powerful tool that not only achieves the optimal K/2 degrees of freedom (DoF) of the K-user single-input single-output (SISO) interference channel (IC), but also allows each user to achieve at least half of its interference-free capacity at any SNR. By considering more general message sets, Nazer et al. also covered the MISO case. In this paper, we consider first the SIMO interference channel and extend ergodic IA techniques to this setting with N<sub>r</sub> receive antennas. Our scheme achieves KN<sub>r</sub>/(N<sub>r</sub> + 1), which is the DoF yielded by (standard) IA and is also the DoF of the channel when K > N<sub>r</sub>. Moreover, this technique exhibits spatial scale invariance. By combining the existing MISO and the new SIMO results, we can also cover MIMO with N<sub>t</sub> transmit antennas for the cases where either N<sub>t</sub>/N<sub>r</sub> or N<sub>r</sub>/N<sub>t</sub> is an integer R, yielding DoF =3D min(N<sub>t</sub>, N<sub>r</sub>)KR/(R + 1) which is optimal for K > R.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"31 1","pages":"6172-6175"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81861347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An inference framework for detection of home appliance activation from voltage measurements 从电压测量中检测家用电器激活的推理框架
Zeyu You, R. Raich, Yonghong Huang
We present an inference framework for automatic detection of activations of home appliances based on voltage envelope waveforms. We cast the problem of appliance detection and recognition as an inference problem. When the activation signatures are known, the problem reduces to a simple detection problem. When the activation signatures are unknown, the problem is reformulated as a blind joint delay estimation. Due to the non-convexity of the negative log-likelihood, finding a global optimal solution is a key challenge. Here, we introduce a novel algorithm to estimate the activation templates, which is guaranteed to yield an error within a factor of two of that of the optimal solution. We apply our method to a real-world dataset consisting of voltage waveform measurements of several appliances obtained in multiple homes over a few weeks. Based on ground truth data, we present a quantitative analysis of the proposed algorithm and alternative approaches.
我们提出了一个基于电压包络波形的家用电器激活自动检测的推理框架。我们将器具的检测和识别问题转化为一个推理问题。当激活签名已知时,问题就简化为一个简单的检测问题。当激活签名未知时,将该问题重新表述为盲联合延迟估计。由于负对数似然的非凸性,寻找全局最优解是一个关键的挑战。在这里,我们引入了一种新的算法来估计激活模板,该算法保证在最优解的两个因子内产生误差。我们将我们的方法应用于一个真实世界的数据集,该数据集由几个星期内在多个家庭中获得的几个电器的电压波形测量组成。基于地面真实数据,我们对所提出的算法和替代方法进行了定量分析。
{"title":"An inference framework for detection of home appliance activation from voltage measurements","authors":"Zeyu You, R. Raich, Yonghong Huang","doi":"10.1109/ICASSP.2014.6854762","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854762","url":null,"abstract":"We present an inference framework for automatic detection of activations of home appliances based on voltage envelope waveforms. We cast the problem of appliance detection and recognition as an inference problem. When the activation signatures are known, the problem reduces to a simple detection problem. When the activation signatures are unknown, the problem is reformulated as a blind joint delay estimation. Due to the non-convexity of the negative log-likelihood, finding a global optimal solution is a key challenge. Here, we introduce a novel algorithm to estimate the activation templates, which is guaranteed to yield an error within a factor of two of that of the optimal solution. We apply our method to a real-world dataset consisting of voltage waveform measurements of several appliances obtained in multiple homes over a few weeks. Based on ground truth data, we present a quantitative analysis of the proposed algorithm and alternative approaches.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"7 1","pages":"6033-6037"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84284901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Vocal timbre analysis using latent Dirichlet allocation and cross-gender vocal timbre similarity 使用潜在狄利克雷分配和跨性别人声音色相似性分析
Tomoyasu Nakano, Kazuyoshi Yoshii, Masataka Goto
This paper presents a vocal timbre analysis method based on topic modeling using latent Dirichlet allocation (LDA). Although many works have focused on analyzing characteristics of singing voices, none have dealt with “latent” characteristics (topics) of vocal timbre, which are shared by multiple singing voices. In the work described in this paper, we first automatically extracted vocal timbre features from polyphonic musical audio signals including vocal sounds. The extracted features were used as observed data, and mixing weights of multiple topics were estimated by LDA. Finally, the semantics of each topic were visualized by using a word-cloud-based approach. Experimental results for a singer identification task using 36 songs sung by 12 singers showed that our method achieved a mean reciprocal rank of 0.86. We also proposed a method for estimating cross-gender vocal timbre similarity by generating pitch-shifted (frequency-warped) signals of every singing voice. Experimental results for a cross-gender singer retrieval task showed that our method discovered interesting similar pitch-shifted singers.
提出了一种基于潜在狄利克雷分配(latent Dirichlet allocation, LDA)的主题建模的人声音色分析方法。虽然许多作品都侧重于分析歌唱声音的特征,但没有一个作品涉及声乐音色的“潜在”特征(主题),这些特征是由多个歌唱声音共享的。在本文所描述的工作中,我们首先从包括人声在内的复调音乐音频信号中自动提取人声音色特征。将提取的特征作为观测数据,利用LDA估计多个主题的混合权值。最后,使用基于词云的方法对每个主题的语义进行可视化。对12位歌手演唱的36首歌曲进行歌手识别的实验结果表明,我们的方法获得了0.86的平均倒数秩。我们还提出了一种通过产生每个歌唱声音的音高移位(频率扭曲)信号来估计跨性别人声音色相似性的方法。在一个跨性别歌手检索任务的实验结果表明,我们的方法发现了有趣的相似音高移位歌手。
{"title":"Vocal timbre analysis using latent Dirichlet allocation and cross-gender vocal timbre similarity","authors":"Tomoyasu Nakano, Kazuyoshi Yoshii, Masataka Goto","doi":"10.1109/ICASSP.2014.6854595","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854595","url":null,"abstract":"This paper presents a vocal timbre analysis method based on topic modeling using latent Dirichlet allocation (LDA). Although many works have focused on analyzing characteristics of singing voices, none have dealt with “latent” characteristics (topics) of vocal timbre, which are shared by multiple singing voices. In the work described in this paper, we first automatically extracted vocal timbre features from polyphonic musical audio signals including vocal sounds. The extracted features were used as observed data, and mixing weights of multiple topics were estimated by LDA. Finally, the semantics of each topic were visualized by using a word-cloud-based approach. Experimental results for a singer identification task using 36 songs sung by 12 singers showed that our method achieved a mean reciprocal rank of 0.86. We also proposed a method for estimating cross-gender vocal timbre similarity by generating pitch-shifted (frequency-warped) signals of every singing voice. Experimental results for a cross-gender singer retrieval task showed that our method discovered interesting similar pitch-shifted singers.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"5202-5206"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84303570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
HRTF magnitude synthesis via sparse representation of anthropometric features 通过人体测量特征的稀疏表示来合成HRTF大小
P. Bilinski, J. Ahrens, Mark R. P. Thomas, I. Tashev, John C. Platt
We propose a method for the synthesis of the magnitudes of Head-related Transfer Functions (HRTFs) using a sparse representation of anthropometric features. Our approach treats the HRTF synthesis problem as finding a sparse representation of the subject's anthropometric features w.r.t. the anthropometric features in the training set. The fundamental assumption is that the magnitudes of a given HRTF set can be described by the same sparse combination as the anthropometric data. Thus, we learn a sparse vector that represents the subject's anthropometric features as a linear superposition of the anthropometric features of a small subset of subjects from the training data. Then, we apply the same sparse vector directly on the HRTF tensor data. For evaluation purpose we use a new dataset, containing both anthropometric features and HRTFs. We compare the proposed sparse representation based approach with ridge regression and with the data of a manikin (which was designed based on average anthropometric data), and we simulate the best and the worst possible classifiers to select one of the HRTFs from the dataset. For instrumental evaluation we use log-spectral distortion. Experiments show that our sparse representation outperforms all other evaluated techniques, and that the synthesized HRTFs are almost as good as the best possible HRTF classifier.
我们提出了一种利用人体特征的稀疏表示来合成头部相关传递函数(hrtf)的大小的方法。我们的方法将HRTF合成问题视为寻找受试者的人体特征的稀疏表示,而不是训练集中的人体特征。基本假设是,给定HRTF集的大小可以用与人体测量数据相同的稀疏组合来描述。因此,我们学习一个稀疏向量,将受试者的人体测量特征表示为训练数据中一小部分受试者的人体测量特征的线性叠加。然后,我们将相同的稀疏向量直接应用于HRTF张量数据。为了评估目的,我们使用了一个包含人体特征和hrtf的新数据集。我们将提出的基于稀疏表示的方法与脊回归和人体模型的数据(基于平均人体测量数据设计)进行比较,并模拟最佳和最差可能分类器,以从数据集中选择一个hrtf。对于仪器评估,我们使用对数光谱失真。实验表明,我们的稀疏表示优于所有其他评估的技术,并且合成的HRTF几乎与最好的HRTF分类器一样好。
{"title":"HRTF magnitude synthesis via sparse representation of anthropometric features","authors":"P. Bilinski, J. Ahrens, Mark R. P. Thomas, I. Tashev, John C. Platt","doi":"10.1109/ICASSP.2014.6854447","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854447","url":null,"abstract":"We propose a method for the synthesis of the magnitudes of Head-related Transfer Functions (HRTFs) using a sparse representation of anthropometric features. Our approach treats the HRTF synthesis problem as finding a sparse representation of the subject's anthropometric features w.r.t. the anthropometric features in the training set. The fundamental assumption is that the magnitudes of a given HRTF set can be described by the same sparse combination as the anthropometric data. Thus, we learn a sparse vector that represents the subject's anthropometric features as a linear superposition of the anthropometric features of a small subset of subjects from the training data. Then, we apply the same sparse vector directly on the HRTF tensor data. For evaluation purpose we use a new dataset, containing both anthropometric features and HRTFs. We compare the proposed sparse representation based approach with ridge regression and with the data of a manikin (which was designed based on average anthropometric data), and we simulate the best and the worst possible classifiers to select one of the HRTFs from the dataset. For instrumental evaluation we use log-spectral distortion. Experiments show that our sparse representation outperforms all other evaluated techniques, and that the synthesized HRTFs are almost as good as the best possible HRTF classifier.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"14 1","pages":"4468-4472"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84367066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 70
Performance analysis of Bag-of-Features based content identification systems 基于特征袋的内容识别系统性能分析
S. Voloshynovskiy, M. Diephuis, T. Holotyak
Many state-of-the-art methods in image retrieval, classification and copy detection are based on the Bag-of-Features (BOF) framework. However, the performance of these systems is mostly experimentally evaluated and little results are reported on theoretical performance. In this paper, we present a statistical framework that makes it possible to analyse the performance of a simple BOF-system and to better understand the impact of different design elements such as the robustness of descriptors, the accuracy of encoding/assignment, information preserving pooling and finally decision making. The proposed framework can be also of interest for a security and privacy analysis of BOF systems.
许多最先进的图像检索、分类和复制检测方法都是基于特征袋(BOF)框架。然而,这些系统的性能大多是实验评估,很少有理论性能的结果报道。在本文中,我们提出了一个统计框架,使分析简单bof系统的性能成为可能,并更好地理解不同设计元素的影响,如描述符的鲁棒性,编码/分配的准确性,信息保留池和最终决策。所提出的框架也可用于BOF系统的安全性和隐私性分析。
{"title":"Performance analysis of Bag-of-Features based content identification systems","authors":"S. Voloshynovskiy, M. Diephuis, T. Holotyak","doi":"10.1109/ICASSP.2014.6854312","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854312","url":null,"abstract":"Many state-of-the-art methods in image retrieval, classification and copy detection are based on the Bag-of-Features (BOF) framework. However, the performance of these systems is mostly experimentally evaluated and little results are reported on theoretical performance. In this paper, we present a statistical framework that makes it possible to analyse the performance of a simple BOF-system and to better understand the impact of different design elements such as the robustness of descriptors, the accuracy of encoding/assignment, information preserving pooling and finally decision making. The proposed framework can be also of interest for a security and privacy analysis of BOF systems.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"18 1","pages":"3799-3803"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84468283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1