2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文中文

Attaining fundamental bounds on timing synchronization 获得时序同步的基本边界

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-03-25 DOI: 10.1109/ICASSP.2012.6289099

P. Bidigare, Upamanyu Madhow, R. Mudumbai, D. Scherber

In this paper, we propose an algorithm for timing synchronization that attains fundamental bounds derived by Weiss and Weinstein. These bounds state that, in addition to improving with time-bandwidth product and signal-to-noise ratio (SNR), timing accuracy also improves as the carrier frequency gets larger, if the SNR is above a threshold. Our algorithm essentially follows the logic of the Weiss-Weinstein bound, and has the following stages: coarse estimation using time domain samples, fine-grained estimation using a Newton algorithm in the frequency domain, and final refinement to within a small fraction of a carrier cycle. While the results here are of fundamental interest, we are motivated to push the limits of synchronization to enable the tight coordination required for emulating virtual antenna arrays using a collection of cooperating nodes.

在本文中，我们提出了一种时序同步算法，该算法达到了Weiss和Weinstein导出的基本边界。这些边界表明，除了随着时间带宽乘积和信噪比(SNR)的提高，如果SNR高于阈值，时序精度也会随着载波频率的增大而提高。我们的算法本质上遵循Weiss-Weinstein界的逻辑，并有以下几个阶段:使用时域样本进行粗估计，在频域使用牛顿算法进行细粒度估计，最后细化到载波周期的一小部分。虽然这里的结果是基本的兴趣，但我们有动力推动同步的限制，以实现使用协作节点集合模拟虚拟天线阵列所需的紧密协调。

引用次数: 30

Audio event detection from acoustic unit occurrence patterns 从声学单元发生模式中检测音频事件

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-03-25 DOI: 10.1109/ICASSP.2012.6287923

Anurag Kumar, Pranay Dighe, Rita Singh, Sourish Chaudhuri, B. Raj

In most real-world audio recordings, we encounter several types of audio events. In this paper, we develop a technique for detecting signature audio events, that is based on identifying patterns of occurrences of automatically learned atomic units of sound, which we call Acoustic Unit Descriptors or AUDs. Experiments show that the methodology works as well for detection of individual events and their boundaries in complex recordings.

在大多数真实世界的录音中，我们会遇到几种类型的音频事件。在本文中，我们开发了一种检测签名音频事件的技术，该技术基于识别自动学习的声音原子单元的出现模式，我们称之为声学单元描述符或aud。实验表明，该方法同样适用于复杂记录中单个事件及其边界的检测。

引用次数: 58

A Bayesian framework for robust speech enhancement under varying contexts 不同语境下稳健语音增强的贝叶斯框架

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-03-25 DOI: 10.1109/ICASSP.2012.6288932

D. Hanumantha, Rao Naidu, Sriram Srinivasan

Single-microphone speech enhancement algorithms that employ trained codebooks of parametric representations of speech spectra have been shown to be successful in the suppression of non-stationary noise, e.g., in mobile phones. In this paper, we introduce the concept of a context-dependent codebook, and look at two aspects of context: dependency on the particular speaker using the mobile device, and on the acoustic condition during usage (e.g., hands-free mode in a reverberant room). Such context-dependent codebooks may be trained on-line. A new scheme is proposed to appropriately combine the estimates resulting from the context-dependent and context-independent codebooks under a Bayesian framework. Experimental results establish that the proposed approach performs better than the context-independent codebook in the case of a context match and better than the context-dependent codebook in the case of a context mismatch.

使用经过训练的语音频谱参数表示码本的单麦克风语音增强算法已被证明在抑制非平稳噪声方面是成功的，例如在移动电话中。在本文中，我们介绍了上下文相关码本的概念，并从两个方面考察了上下文:对使用移动设备的特定扬声器的依赖，以及使用过程中的声学条件(例如，混响室中的免提模式)。这种与上下文相关的密码本可以在线培训。在贝叶斯框架下，提出了一种将上下文相关和上下文无关的码本估计相结合的新方案。实验结果表明，该方法在上下文匹配的情况下优于上下文无关的码本，在上下文不匹配的情况下优于上下文依赖的码本。

引用次数: 6

Improving arabic broadcast transcription using automatic topic clustering 利用自动主题聚类改进阿拉伯语广播转录

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-03-25 DOI: 10.1109/ICASSP.2012.6288907

Stephen M. Chu, L. Mangu

Latent Dirichlet Allocation (LDA) has been shown to be an effective model to augment n-gram language models in speech recognition applications. In this work, we aim to take advantage of the superior unsupervised learning ability of the framework, and use it to uncover topic structure embedded in the corpora in an entirely data-driven fashion. In addition, we describe a bi-level inference and classification method that allows topic clustering at the utterance level while preserving the document-level topic structures. We demonstrate the effectiveness of the proposed topic clustering pipeline in a state-of-the-art Arabic broadcast transcription system. Experiments show that optimizing LM in the LDA topic space leads to 5% reduction in language model perplexity. It is further shown that topic clustering and adaptation is able to attain 0.4% absolute word error rate reduction on the GALE Arabic task.

在语音识别应用中，潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)是一种增强n-gram语言模型的有效模型。在这项工作中，我们的目标是利用框架优越的无监督学习能力，并以完全数据驱动的方式使用它来揭示嵌入在语料库中的主题结构。此外，我们还描述了一种双级推理和分类方法，该方法允许在话语级别上进行主题聚类，同时保留文档级别的主题结构。我们在最先进的阿拉伯语广播转录系统中展示了所提出的主题聚类管道的有效性。实验表明，在LDA主题空间中优化LM可以使语言模型困惑度降低5%。进一步表明，主题聚类和自适应能够使GALE阿拉伯语任务的绝对错误率降低0.4%。

引用次数: 2

Design and implementation of a fully integrated compressed-sensing signal acquisition system 全集成压缩传感信号采集系统的设计与实现

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-03-25 DOI: 10.1109/ICASSP.2012.6289123

Juhwan Yoo, Stephen Becker, M. Monge, M. Loh, E. Candès, A. Emami-Neyestanak

Compressed sensing (CS) is a topic of tremendous interest because it provides theoretical guarantees and computationally tractable algorithms to fully recover signals sampled at a rate close to its information content. This paper presents the design of the first physically realized fully-integrated CS based Analog-to-Information (A2I) pre-processor known as the Random-Modulation Pre-Integrator (RMPI) [1]. The RMPI achieves 2GHz bandwidth while digitizing samples at a rate 12.5× lower than the Nyquist rate. The success of this implementation is due to a coherent theory/algorithm/hardware co-design approach. This paper addresses key aspects of the design, presents simulation and hardware measurements, and discusses limiting factors in performance.

压缩感知(CS)是一个非常有趣的话题，因为它提供了理论保证和计算上易于处理的算法，以接近其信息内容的速率完全恢复采样信号。本文介绍了第一个物理实现的完全集成的基于CS的模拟-信息(A2I)预处理器的设计，称为随机调制预积分器(RMPI)[1]。RMPI实现2GHz带宽，同时以比奈奎斯特速率低12.5倍的速率对采样进行数字化。这种实现的成功是由于连贯的理论/算法/硬件协同设计方法。本文讨论了设计的关键方面，给出了仿真和硬件测量，并讨论了性能的限制因素。

引用次数: 73

A model structure integration based on a Bayesian framework for speech recognition 基于贝叶斯框架的语音识别模型结构集成

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-03-25 DOI: 10.1109/ICASSP.2012.6288996

Sayaka Shiota, Kei Hashimoto, Yoshihiko Nankaku, K. Tokuda

This paper proposes an acoustic modeling technique based on Bayesian framework using multiple model structures for speech recognition. The Bayesian approach is a statistical technique for estimating reliable predictive distributions by marginalizing model parameters, and its effectiveness in HMM-based speech recognition has been reported. Although the basic idea underlying the Bayesian approach is to treat all parameters as random variables, only one model structure is still selected in the conventional method. Multiple model structures are treated as latent variables in the proposed method and integrated based on the Bayesian framework. Furthermore, we applied deterministic annealing to the training algorithm to estimate appropriate acoustic models. The proposed method effectively utilizes multiple model structures, especially in the early stage of training and this leads to better predictive distributions and improvement of recognition performance.

提出了一种基于贝叶斯框架的多模型结构声学建模技术，用于语音识别。贝叶斯方法是一种通过边缘化模型参数来估计可靠预测分布的统计技术，它在基于hmm的语音识别中的有效性已经得到了报道。尽管贝叶斯方法的基本思想是将所有参数视为随机变量，但传统方法仍然只选择一种模型结构。该方法将多个模型结构作为潜在变量，并基于贝叶斯框架进行集成。此外，我们将确定性退火应用到训练算法中，以估计合适的声学模型。该方法有效地利用了多种模型结构，特别是在训练的早期阶段，从而获得了更好的预测分布，提高了识别性能。

引用次数: 0

Generalized k-labelset ensemble for multi-label classification 多标签分类的广义k-标签集集成

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-03-25 DOI: 10.1109/ICASSP.2012.6288315

Hung-Yi Lo, Shou-de Lin, H. Wang

Label powerset (LP) method is one category of multi-label learning algorithms. It reduces the multi-label classification problem to a multi-class classification problem by treating each distinct combination of labels in the training set as a different class. This paper proposes a basis expansion model for multi-label classification, where a basis function is a LP classifier trained on a random k-labelset. The expansion coefficients are learned to minimize the global error between the prediction and the multi-label ground truth. We derive an analytic solution to learn the coefficients efficiently. We have conducted experiments using several benchmark datasets and compared our method with other state-of-the-art multi-label learning methods. The results show that our method has better or competitive performance against other methods.

标签功率集(LP)方法是多标签学习算法的一种。它通过将训练集中每个不同的标签组合视为不同的类别，将多标签分类问题简化为多类分类问题。本文提出了一种多标签分类的基展开模型，其中基函数是在随机k-标签集上训练的LP分类器。学习扩展系数以最小化预测与多标签真实值之间的全局误差。我们推导了一个解析解来有效地学习系数。我们使用几个基准数据集进行了实验，并将我们的方法与其他最先进的多标签学习方法进行了比较。结果表明，与其他方法相比，我们的方法具有更好或更具竞争力的性能。

引用次数: 3

On the identifiability of multi-observer hidden Markov models 多观测器隐马尔可夫模型的可辨识性

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-03-25 DOI: 10.1109/ICASSP.2012.6288268

H. Nguyen, M. Roughan

Most large attacks on the Internet are distributed. As a result, such attacks are only partially observed by any one Internet service provider (ISP). Detection would be significantly easier with pooled observations, but privacy concerns often limit the information that providers are willing to share. Multi-party secure distributed computation provides a means for combining observations without compromising privacy. In this paper, we show the benefits of this approach, the most notable of which is that combinations of observations solve identifiability problems in existing approaches for detecting network attacks.

互联网上的大多数大型攻击都是分布式的。因此，此类攻击只能被任何一个互联网服务提供商(ISP)部分观察到。如果将观察结果集合起来，检测起来会容易得多，但隐私问题往往会限制提供者愿意分享的信息。多方安全分布式计算提供了一种在不损害隐私的情况下组合观察结果的方法。在本文中，我们展示了这种方法的好处，其中最值得注意的是，观察组合解决了现有方法中用于检测网络攻击的可识别性问题。

引用次数: 4

Adaptive parameter selection for asynchronous intrafascicular multi-electrode stimulation 异步束内多电极刺激的自适应参数选择

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-03-25 DOI: 10.1109/ICASSP.2012.6287993

M. A. Frankel, G. Clark, S. Meek, R. Normann, V. J. Mathews

This paper describes an adaptive algorithm for selecting perelectrode stimulus intensities and inter-electrode stimulation phasing to achieve desired isometric plantar-flexion forces via asynchronous, intrafascicular multi-electrode stimulation. The algorithm employed a linear model of force production and a gradient descent approach for updating the parameters of the model. The adaptively selected model stimulation parameters were validated in experiments in which stimulation was delivered via a Utah Slanted Electrode Array that was acutely implanted in the sciatic nerve of an anesthetized feline. In simulations and experiments, desired steps in force were evoked, and exhibited short time-to-peak (<; 0.5 s), low overshoot (<; 10%), low steady-state error (<; 4%), and low steady-state ripple (<; 12%), with rapid convergence of stimulation parameters. For periodic desired forces, the algorithm was able to quickly converge and experimental trials showed low amplitude error (mean error <; 10% of maximum force), and short time delay (<; 250 ms).

本文描述了一种自适应算法，用于选择过电极刺激强度和电极间刺激相位，以通过异步、束状内多电极刺激获得所需的等距跖屈曲力。该算法采用力产生的线性模型，采用梯度下降法对模型参数进行更新。通过将犹他州倾斜电极阵列急性植入麻醉猫坐骨神经的实验，验证了自适应选择的模型刺激参数。在模拟和实验中，期望的力度步骤被唤起，并表现出短的峰值时间(<;0.5 s)，低超调(<;10%)，低稳态误差(<;4%)，低稳态纹波(<;12%)，增产参数快速收敛。对于周期性期望力，该算法收敛速度快，实验结果显示幅值误差小(平均误差<;最大力的10%)，时间延迟短(<;250 ms)。

引用次数: 2

Robust speech recognition through selection of speaker and environment transforms 通过说话人选择和环境变换实现鲁棒语音识别

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-03-25 DOI: 10.1109/ICASSP.2012.6288878

Raghavendra Bilgi, Vikas Joshi, S. Umesh, Luz García, M. C. Benítez

In this paper, we address the problem of robustness to both noise and speaker-variability in automatic speech recognition (ASR). We propose the use of pre-computed Noise and Speaker transforms, and an optimal combination of these two transforms are chosen during test using maximum-likelihood (ML) criterion. These pre-computed transforms are obtained during training by using data obtained from different noise conditions that are usually encountered for that particular ASR task. The environment transforms are obtained during training using constrained-MLLR (CMLLR) framework, while for speaker-transforms we use the analytically determined linear-VTLN matrices. Even though the exact noise environment may not be encountered during test, the ML-based choice of the closest Environment transform provides “sufficient” cleaning and this is corroborated by experimental results with performance comparable to histogram equalization or Vector Taylor Series approaches on Aurora-2 task. The proposed method is simple since it involves only the choice of pre-computed environment and speaker transforms and therefore, can be applied with very little test data unlike many other speaker and noise-compensation methods.

在本文中，我们讨论了自动语音识别(ASR)中对噪声和说话人变异性的鲁棒性问题。我们建议使用预先计算的噪声和扬声器变换，并在测试中使用最大似然(ML)标准选择这两个变换的最佳组合。这些预先计算的变换是在训练过程中通过使用从特定ASR任务通常遇到的不同噪声条件中获得的数据获得的。环境变换在训练过程中使用约束- mllr (cllr)框架，而说话人变换则使用解析确定的线性- vtln矩阵。尽管在测试过程中可能不会遇到确切的噪声环境，但基于ml的最接近环境变换的选择提供了“充分”的清洁，并且实验结果证实了这一点，其性能可与Aurora-2任务上的直方图均衡或矢量泰勒级数方法相媲美。该方法简单，因为它只涉及预先计算的环境和扬声器变换的选择，因此，与许多其他扬声器和噪声补偿方法不同，它可以应用于很少的测试数据。

{"title":"Robust speech recognition through selection of speaker and environment transforms","authors":"Raghavendra Bilgi, Vikas Joshi, S. Umesh, Luz García, M. C. Benítez","doi":"10.1109/ICASSP.2012.6288878","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288878","url":null,"abstract":"In this paper, we address the problem of robustness to both noise and speaker-variability in automatic speech recognition (ASR). We propose the use of pre-computed Noise and Speaker transforms, and an optimal combination of these two transforms are chosen during test using maximum-likelihood (ML) criterion. These pre-computed transforms are obtained during training by using data obtained from different noise conditions that are usually encountered for that particular ASR task. The environment transforms are obtained during training using constrained-MLLR (CMLLR) framework, while for speaker-transforms we use the analytically determined linear-VTLN matrices. Even though the exact noise environment may not be encountered during test, the ML-based choice of the closest Environment transform provides “sufficient” cleaning and this is corroborated by experimental results with performance comparable to histogram equalization or Vector Taylor Series approaches on Aurora-2 task. The proposed method is simple since it involves only the choice of pre-computed environment and speaker transforms and therefore, can be applied with very little test data unlike many other speaker and noise-compensation methods.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"36 1","pages":"4333-4336"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81343088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀