2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文中文

A maximal figure-of-merit learning approach to maximizing mean average precision with deep neural network based classifiers 基于深度神经网络的分类器平均精度最大化的最大优值学习方法

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2014-07-14 DOI: 10.1109/ICASSP.2014.6854454

Kehuang Li, Zhen Huang, You-Chi Cheng, Chin-Hui Lee

We propose a maximal figure-of-merit (MFoM) learning framework to directly maximize mean average precision (MAP) which is a key performance metric in many multi-class classification tasks. Conventional classifiers based on support vector machines cannot be easily adopted to optimize the MAP metric. On the other hand, classifiers based on deep neural networks (DNNs) have recently been shown to deliver a great discrimination capability in automatic speech recognition and image classification as well. However, DNNs are usually optimized with the minimum cross entropy criterion. In contrast to most conventional classification methods, our proposed approach can be formulated to embed DNNs and MAP into the objective function to be optimized during training. The combination of the proposed maximum MAP (MMAP) technique and DNNs introduces nonlinearity to the linear discriminant function (LDF) in order to increase the flexibility and discriminant power of the original MFoM-trained LDF based classifiers. Tested on both automatic image annotation and audio event classification, the experimental results show consistent improvements of MAP on both datasets when compared with other state-of-the-art classifiers without using MMAP.

我们提出了一个最大优点图(MFoM)学习框架来直接最大化平均精度(MAP)， MAP是许多多类分类任务的关键性能指标。传统的基于支持向量机的分类器难以用于MAP度量的优化。另一方面，基于深度神经网络(dnn)的分类器在自动语音识别和图像分类方面也表现出了很强的识别能力。然而，深度神经网络通常采用最小交叉熵准则进行优化。与大多数传统的分类方法相比，我们提出的方法可以在训练过程中将dnn和MAP嵌入到待优化的目标函数中。提出的最大MAP (MMAP)技术与深度神经网络相结合，将非线性引入线性判别函数(LDF)中，以提高原始mfom训练的基于LDF的分类器的灵活性和判别能力。在自动图像标注和音频事件分类上进行了测试，实验结果表明，与不使用MMAP的其他最先进的分类器相比，MAP在这两个数据集上的改进是一致的。

{"title":"A maximal figure-of-merit learning approach to maximizing mean average precision with deep neural network based classifiers","authors":"Kehuang Li, Zhen Huang, You-Chi Cheng, Chin-Hui Lee","doi":"10.1109/ICASSP.2014.6854454","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854454","url":null,"abstract":"We propose a maximal figure-of-merit (MFoM) learning framework to directly maximize mean average precision (MAP) which is a key performance metric in many multi-class classification tasks. Conventional classifiers based on support vector machines cannot be easily adopted to optimize the MAP metric. On the other hand, classifiers based on deep neural networks (DNNs) have recently been shown to deliver a great discrimination capability in automatic speech recognition and image classification as well. However, DNNs are usually optimized with the minimum cross entropy criterion. In contrast to most conventional classification methods, our proposed approach can be formulated to embed DNNs and MAP into the objective function to be optimized during training. The combination of the proposed maximum MAP (MMAP) technique and DNNs introduces nonlinearity to the linear discriminant function (LDF) in order to increase the flexibility and discriminant power of the original MFoM-trained LDF based classifiers. Tested on both automatic image annotation and audio event classification, the experimental results show consistent improvements of MAP on both datasets when compared with other state-of-the-art classifiers without using MMAP.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"4503-4507"},"PeriodicalIF":0.0,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79662591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 40

PDE-based interpolation method for optically visualized sound field 基于偏微分方程的光可视化声场插值方法

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2014-07-14 DOI: 10.1109/ICASSP.2014.6854501

K. Yatabe, Yasuhiro Oikawa

An effective way to understand the behavior of a sound field is to visualize it. An optical measurement method is a suitable option for this as it enables contactless non-destructive measurement. After measuring a sound field, interpolation of the data is necessary for a smooth visualization. However, conventional interpolation methods cannot provide a physically meaningful result especially when the condition of the measurement causes moiré effect. In this paper, a special interpolation method for an optically visualized sound field based on the Kirchhoff-Helmholtz integral equation is proposed.

理解声场行为的有效方法是将其形象化。光学测量方法是一个合适的选择，因为它可以实现非接触式无损测量。在测量声场后，为了实现平滑的可视化，需要对数据进行插值。然而，传统的插值方法不能提供有物理意义的结果，特别是当测量条件引起莫尔效应时。本文提出了一种基于Kirchhoff-Helmholtz积分方程的光学声场插值方法。

引用次数: 11

Simplified MIMO relay design for multicasting from multiple-sources 简化MIMO中继设计，用于多源多播

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2014-07-14 DOI: 10.1109/ICASSP.2014.6854098

Muhammad R. A. Khandaker, Y. Rong

In this paper, we consider a dual-hop multicasting multiple-input multiple-output (MIMO) relay system where multiple transmitters multicast their own messages to a group of receivers with the aid of a relay node, and all nodes are equipped with multiple antennas. We aim at minimizing the maximal MSE of the signal waveform estimation among all receivers subjecting to power constraints at the transmitters and the relay node. We propose a low complexity solution for the problem under some mild approximation. In particular, we show that under (moderately) high signal-to-noise ratio (SNR) assumption, the min-max optimization problem can be solved using the semidefinite programming (SDP) technique. Numerical simulations demonstrate the effectiveness of the proposed algorithm.

本文研究了一种双跳多播多输入多输出(MIMO)中继系统，其中多个发射机借助中继节点将自己的消息多播给一组接收器，并且所有节点都配备有多个天线。我们的目标是在受发射机和中继节点的功率限制的所有接收器之间最小化信号波形估计的最大MSE。我们在一些温和的近似下提出了一个低复杂度的解决方案。特别地，我们证明了在(中等)高信噪比(SNR)假设下，最小-最大优化问题可以使用半定规划(SDP)技术来解决。数值仿真验证了该算法的有效性。

引用次数: 4

The design of Ambisonic reproduction system based on dynamic gain parameters 基于动态增益参数的双声重放系统设计

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2014-07-14 DOI: 10.1109/ICASSP.2014.6854444

Bing Bu, C. Bao, Mao-shen Jia, Rong Zhu

This paper describes a design approach of Ambisonic reproduction system based on dynamic gain parameters (DGP). In the conventional approaches, the fixed gain parameters are often optimized to minimize the overall objective function for whole 360° sound stage. The proposed approach has an advantage that the gain parameters vary with angles of source objects. The problem of optimization tradeoff among different angles is overcome by DGP, which achieves an optimal solution in each position. Source localizations of the B-Format signals were estimated in frequency bands in order to match the corresponding gain parameters. For the synthesized signals, the process was simplified by the given spatial information. Using the head-related transfer function (HRTF) analysis, the proposed approach was found to be significantly better than reference approaches in interaural time difference (ITD) and interaural level difference (ILD).

介绍了一种基于动态增益参数(DGP)的双声重放系统的设计方法。在传统方法中，通常对固定增益参数进行优化，以最小化整个360°声场的总体目标函数。该方法具有增益参数随源目标角度变化的优点。该算法克服了不同角度间的优化权衡问题，在每个位置都能得到最优解。为了匹配相应的增益参数，在频段内估计b格式信号的源定位。对于合成信号，利用给定的空间信息简化了处理过程。使用头部相关传递函数(HRTF)分析，发现该方法在耳间时差(ITD)和耳间水平差(ILD)方面明显优于参考方法。

引用次数: 1

Purify: A new algorithmic framework for next-generation radio-interferometric imaging Purify:下一代无线电干涉成像的新算法框架

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2014-06-02 DOI: 10.1109/ICASSP.2014.6854636

R. Carrillo, J. McEwen, Y. Wiaux

In recent works, compressed sensing and convex optimization techniques have been applied to radio-interferometric imaging showing the potential to outperform state-of-the-art imaging algorithms in the field. We review our latest contributions, which leverage the versatility of convex optimization to both handle realistic continuous visibilities and offer a highly parallelizable structure paving the way to significant acceleration of the reconstruction and high-dimensional data scalability. The new algorithmic structure, promoted in a new software PURIFY (beta version), relies on the simultaneous-direction method of multipliers (SDMM). The performance of various sparsity priors is evaluated through simulations in the continuous visibility setting, confirming the superiority of our recent average sparsity approach SARA.

在最近的工作中，压缩感知和凸优化技术已应用于无线电干涉成像，显示出在该领域超越最先进成像算法的潜力。我们回顾了我们的最新贡献，这些贡献利用凸优化的多功能性来处理现实的连续可见性，并提供高度并行化的结构，为重建和高维数据可扩展性的显著加速铺平了道路。新的算法结构，在新的软件PURIFY (beta版本)中推广，依赖于乘数器的同步方向方法(SDMM)。在连续可视性条件下，通过仿真评估了各种稀疏先验算法的性能，验证了我们提出的平均稀疏先验算法SARA的优越性。

引用次数: 1

A pattern recognition approach based on electrodermal response for pathological mood identification in bipolar disorders 基于皮肤电反应的模式识别方法用于双相情感障碍的病理性情绪识别

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2014-05-04 DOI: 10.1109/ICASSP.2014.6854272

A. Lanatà, A. Greco, G. Valenza, E. Scilingo

This paper reports on results of a pattern recognition technique for classifying pathological mental states of bipolar disorders using information gathered from the electrodermal response. The rationale behind this work is that the autonomic nervous system dynamics, non-invasively quantified through the electrodermal response processing, is altered by the specific mood state. Starting from the hypothesis that bipolar disorders are associated with affective dysfunctions, we processed data gathered from four bipolar patients through eleven experimental trials while an ad-hoc emotional stimulation is administered. Intra- and inter-subject variability were investigated. We show that, using a deconvolution-based approach to estimate sympathetic ANS markers and simple k-Nearest Neighbor algorithms, the proposed methodology is able to discern up to three mood states such as depression, hypo-mania, and euthymia with an average intra-subject accuracy greater than 98% and inter-subject accuracy greater than 82%.

本文报告了一种模式识别技术的结果，该技术利用从皮肤电反应收集的信息来分类双相情感障碍的病理精神状态。这项工作背后的基本原理是自主神经系统动力学，通过皮电反应处理非侵入性量化，被特定的情绪状态改变。从双相情感障碍与情感功能障碍相关的假设出发，我们通过11项实验试验处理了从4名双相情感障碍患者收集的数据，同时给予特别的情绪刺激。研究了受试者内部和受试者之间的变异性。我们表明，使用基于反卷积的方法来估计交感神经网络标记和简单的k-最近邻算法，所提出的方法能够识别多达三种情绪状态，如抑郁、低躁狂和心境愉悦，平均受试者内准确度大于98%，受试者间准确度大于82%。

引用次数: 23

Speech and audio loudness depending on telephone audio bandwidth and codec — A subjective testing approach 取决于电话音频带宽和编解码器的语音和音频响度。主观测试方法

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2014-05-04 DOI: 10.1109/ICASSP.2014.6853812

I. Edjekouane, C. Plapous, C. Quinquis, S. Meunier

In this paper, we propose a new approach for the subjective assessment of the loudness of complex audio signals such as speech or music. This two-stage approach makes it possible to study the influence on loudness of the frequency bandwidth and of different kinds of codecs. In the first stage, the individual loudness function of each subject is estimated using a specific 100-point response scale. In the second stage, the subject evaluates the loudness of each processed sample, by filtering or coding/decoding, using the same scale. The loudness obtained in terms of points is then converted in loudness levels in terms of phons using the estimated individual loudness function. Results show that loudness increases with the bandwidth extension up to super-wideband. Similar behavior is observed when codecs are applied.

在本文中，我们提出了一种新的主观评估复杂音频信号(如语音或音乐)响度的方法。这种两阶段方法使得研究频宽和不同编解码器对响度的影响成为可能。在第一阶段，使用特定的100分反应量表估计每个受试者的个人响度函数。在第二阶段，受试者评估每个处理样本的响度，通过过滤或编码/解码，使用相同的尺度。然后使用估计的单个响度函数将以点为单位获得的响度转换为以电话为单位的响度级别。结果表明，当带宽扩展到超宽带时，响度随带宽的增大而增大。在应用编解码器时也会观察到类似的行为。

引用次数: 1

A novel cepstral representation for timbre modeling of sound sources in polyphonic mixtures 复调混合声源音色建模的一种新的倒谱表示

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2014-05-04 DOI: 10.1109/ICASSP.2014.6855057

Z. Duan, Bryan Pardo, L. Daudet

We propose a novel cepstral representation called the uniform discrete cepstrum (UDC) to represent the timbre of sound sources in a sound mixture. Different from ordinary cepstrum and MFCC which have to be calculated from the full magnitude spectrum of a source after source separation, UDC can be calculated directly from isolated spectral points that are likely to belong to the source in the mixture spectrum (e.g., non-overlapping harmonics of a harmonic source). Existing cepstral representations that have this property are discrete cepstrum and regularized discrete cepstrum, however, compared to the proposed UDC, they are not as effective and are more complex to compute. The key advantage of UDC is that it uses a more natural and locally adaptive regularizer to prevent it from overfitting the isolated spectral points. We derive the mathematical relations between these cepstral representations, and compare their timbre modeling performances in the task of instrument recognition in polyphonic audio mixtures. We show that UDC and its mel-scale variant MUDC significantly outperform all the other representations.

我们提出了一种新的倒谱表示，称为均匀离散倒谱(UDC)来表示声音混合中声源的音色。与普通倒谱和MFCC需要从源分离后的全等谱中计算不同，UDC可以直接从混合谱中可能属于源的孤立谱点(如谐波源的非重叠谐波)计算。现有的具有此特性的倒谱表示是离散倒谱和正则化离散倒谱，然而，与所提出的UDC相比，它们没有那么有效，而且计算起来更复杂。UDC的主要优点是它使用了一种更自然和局部自适应的正则化器来防止它对孤立的谱点进行过拟合。我们推导了这些倒谱表示之间的数学关系，并比较了它们在复调混音乐器识别任务中的音色建模性能。我们表明UDC及其mel-scale变体MUDC显著优于所有其他表示。

{"title":"A novel cepstral representation for timbre modeling of sound sources in polyphonic mixtures","authors":"Z. Duan, Bryan Pardo, L. Daudet","doi":"10.1109/ICASSP.2014.6855057","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6855057","url":null,"abstract":"We propose a novel cepstral representation called the uniform discrete cepstrum (UDC) to represent the timbre of sound sources in a sound mixture. Different from ordinary cepstrum and MFCC which have to be calculated from the full magnitude spectrum of a source after source separation, UDC can be calculated directly from isolated spectral points that are likely to belong to the source in the mixture spectrum (e.g., non-overlapping harmonics of a harmonic source). Existing cepstral representations that have this property are discrete cepstrum and regularized discrete cepstrum, however, compared to the proposed UDC, they are not as effective and are more complex to compute. The key advantage of UDC is that it uses a more natural and locally adaptive regularizer to prevent it from overfitting the isolated spectral points. We derive the mathematical relations between these cepstral representations, and compare their timbre modeling performances in the task of instrument recognition in polyphonic audio mixtures. We show that UDC and its mel-scale variant MUDC significantly outperform all the other representations.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"72 1","pages":"7495-7499"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74061692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Calibration and multiple system fusion for spoken term detection using linear logistic regression 基于线性逻辑回归的口语词检测校准和多系统融合

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2014-05-04 DOI: 10.1109/ICASSP.2014.6854985

Julien van Hout, L. Ferrer, D. Vergyri, N. Scheffer, Yun Lei, V. Mitra, S. Wegmann

State-of-the-art calibration and fusion approaches for spoken term detection (STD) systems currently rely on a multi-pass approach where the scores are calibrated, then fused, and finally re-calibrated to obtain a single decision threshold across keywords. While the above techniques are theoretically correct, they rely on meta-parameter tuning and are prone to over-fitting. This study presents an efficient and effective score calibration technique for keyword detection that is based on the logistic regression calibration approach commonly used in forensic speaker identification. The technique applies seamlessly to both single systems and to system fusion, and enables optimization for specific keyword detection evaluation functions. We run experiments on a Vietnamese STD task, comparing the technique with more empirical calibration and fusion schemes and demonstrate that we can achieve comparable or better performance in terms of the NIST ATWV metric with a more elegant solution.

口语词检测(STD)系统的最先进的校准和融合方法目前依赖于多通道方法，其中分数被校准，然后融合，最后重新校准，以获得跨关键字的单一决策阈值。虽然上述技术在理论上是正确的，但它们依赖于元参数调整，并且容易过度拟合。基于法医说话人识别中常用的逻辑回归校准方法，提出了一种高效的关键字检测分数校准技术。该技术可无缝应用于单个系统和系统融合，并可优化特定关键字检测评估功能。我们在越南STD任务上进行了实验，将该技术与更多经验校准和融合方案进行了比较，并证明我们可以通过更优雅的解决方案获得与NIST ATWV度量相当或更好的性能。

引用次数: 11

Simulation-driven emulation of collaborative algorithms to assess their requirements for a large-scale WSN implementation 仿真驱动的协作算法仿真，以评估大规模WSN实现的需求

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2014-05-04 DOI: 10.1109/ICASSP.2014.6855232

D. Manatakis, Michael G. Nennes, I. Bakas, E. Manolakos

Assessing how the performance of a decentralized wireless sensor network (WSN) algorithm's implementation scales, in terms of communication and energy costs, as the network size increases is an essential requirement before its field deployment. Simulations are commonly used for this purpose, especially for large-scale environmental monitoring applications. However, it is difficult to evaluate energy consumption, processing and memory requirements before the algorithm is really ported to a real WSN platform. We propose a method for emulating the operation of collaborative algorithms in large-scale WSNs by re-using a small number of available real sensor nodes. We demonstrate the potential of the proposed simulation-driven WSN emulation approach by using it to estimate how communication and energy costs scale with the network's size when implementing a collaborative algorithm we developed in [12] for tracking the spatiotemporal evolution of a progressing environmental hazard.

在现场部署之前，评估分散式无线传感器网络(WSN)算法在通信和能源成本方面的性能如何随着网络规模的增加而扩大，这是一项基本要求。模拟通常用于此目的，特别是用于大规模环境监测应用。然而，在将该算法真正移植到实际的WSN平台之前，很难评估其能耗、处理和内存需求。我们提出了一种通过重用少量可用的真实传感器节点来模拟大规模wsn中协作算法运行的方法。我们展示了所提出的仿真驱动的WSN仿真方法的潜力，通过使用它来估计在实施我们在[12]中开发的用于跟踪进展中的环境危害的时空演变的协作算法时，通信和能源成本如何随网络规模而变化。

引用次数: 4

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀