2008 IEEE International Conference on Acoustics, Speech and Signal Processing最新文献

英文中文

Open-vocabulary spoken term detection using graphone-based hybrid recognition systems 基于石墨烯混合识别系统的开放词汇口语术语检测

2008 IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 2008-05-12 DOI: 10.1109/ICASSP.2008.4518841

Murat Akbacak, D. Vergyri, A. Stolcke

We address the problem of retrieving out-of-vocabulary (OOV) words/queries from audio archives for spoken term detection (STD) task. Many STD systems use the output of an automatic speech recognition (ASR) system which has a limited and fixed vocabulary, and are not capable of detecting rare words of high information content, such as named entities. Since such words are often of great interest for a retrieval task it is important to index spoken archives in a way that allows a user to search an OOV query/term.1 In this work, we employ hybrid recognition systems which contain both words and subword units (graphones) to generate hybrid lattice indexes. We use a word-based STD system as our baseline, and present improvements by employing our proposed hybrid STD system that uses words plus graphones on the English broadcast news genre of the 2006 NIST STD task.

我们解决了从音频档案中检索词汇外(OOV)单词/查询的问题，用于口语术语检测(STD)任务。许多STD系统使用自动语音识别(ASR)系统的输出，该系统具有有限和固定的词汇量，并且不能检测高信息含量的稀有词，例如命名实体。由于这些词通常是检索任务非常感兴趣的，因此以一种允许用户搜索OOV查询/术语的方式对口语档案进行索引是很重要的在这项工作中，我们采用包含词和子词单元(石墨元)的混合识别系统来生成混合晶格索引。我们使用基于单词的STD系统作为基准，并通过在2006年NIST STD任务的英语广播新闻类型中使用我们提出的混合STD系统(使用单词和graphone)来进行改进。

引用次数: 59

Embedded transform coding of audio signals by model-based bit plane coding 基于模型位平面编码的音频信号嵌入变换编码

2008 IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 2008-05-12 DOI: 10.1109/ICASSP.2008.4518534

Thi Minh Nguyet Hoang, M. Oger, S. Ragot, M. Antonini

This paper proposes a new model-based method for transform coding of audio signals. The input signal is mapped in "perceptual" domain by linear-predictive weighting filter followed by modified discrete cosine transform (MDCT). To provide bitstream scalability, model-based bit plane coding is then applied with respect to the mean square error (MSE) criterion. We present methods to estimate the symbol probability in bit planes assuming a generalized Gaussian model for the distribution of MDCT coefficients. We compare the performance of the proposed bitstream scalable coder with stack-run coding and ITU-T G.722.1. Objective and subjective quality results are presented. The proposed coder is equivalent to or slightly worse than reference coders, but presents the nice advantage of being scalable. Performance penalty due to bitstream scalability is evident at low bitrates.

提出了一种新的基于模型的音频信号变换编码方法。通过线性预测加权滤波和改进离散余弦变换(MDCT)将输入信号映射到“感知”域。为了提供比特流的可扩展性，基于模型的位平面编码随后被应用于均方误差(MSE)标准。我们提出了一种估计位平面上符号概率的方法，假设MDCT系数分布的广义高斯模型。我们将所提出的比特流可扩展编码器与堆栈运行编码和ITU-T G.722.1的性能进行了比较。给出了客观和主观质量结果。所建议的编码器与参考编码器相当或略差，但具有可扩展性的优点。由于比特流可伸缩性造成的性能损失在低比特率下是显而易见的。

引用次数: 4

On optimal anchor node placement in sensor localization by optimization of subspace principal angles 基于子空间主角优化的传感器定位中最优锚节点放置

2008 IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 2008-05-12 DOI: 10.1109/ICASSP.2008.4518103

J. Ash, R. Moses

In sensor network self-localization, anchor nodes provide a convenient means to disambiguate scene translation and rotation, thereby affording estimates in an absolute coordinate system. However, localization performance depends on the positions of the anchor nodes relative to the unknown-location nodes. Conventional wisdom in the literature is that anchor nodes should be placed around the perimeter of the network. In this paper, we show analytically why this strategy works well universally. We demonstrate that perimeter placement forces the information provided by the anchor constraints to closely align with the subspace that cannot be estimated from inter-node measurements: the subspace of translations and rotations. Examples quantify the efficacy of perimeter placement of anchors.

在传感器网络自定位中，锚节点提供了一种方便的方法来消除场景平移和旋转的歧义，从而提供绝对坐标系下的估计。然而，定位性能取决于锚节点相对于未知位置节点的位置。文献中的传统智慧是锚节点应该放置在网络的周边。在本文中，我们分析了为什么这种策略普遍有效。我们证明了周长放置迫使锚约束提供的信息与不能从节点间测量估计的子空间紧密对齐:平移和旋转的子空间。实例量化了周边锚固位置的有效性。

引用次数: 71

Unsupervised anchor space generation for similarity measurement of general audio 通用音频相似性度量的无监督锚点空间生成

2008 IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 2008-05-12 DOI: 10.1109/ICASSP.2008.4517544

Lie Lu, A. Hanjalic

Reliably measuring similarity between audio clips is critical to many applications. As opposed to the conventional way of measuring audio similarity using low-level features directly, in this paper we consider the similarity computation using an anchor space. Each dimension of such a space corresponds to a semantic category (anchor). Mapping an audio clip onto this space results in a vector, which indicates the membership probability of this audio clip with respect to each semantic category. The more similar the mappings of two audio clips, the more similar they are. While an anchor space is typically generated in a supervised fashion, supervised approach is infeasible in many realistic scenarios where audio content semantics is too diverse or simply unknown a priori. We therefore propose an unsupervised approach to anchor space generation. There, spectral clustering is employed to cluster the audio clips with similar low-level features and then the obtained clusters are adopted as semantic categories. Using this semantic space for audio similarity computation shows a considerable accuracy improvement (7% on mAP) in an audio retrieval system, compared with the conventional low-level feature based approach.

可靠地测量音频片段之间的相似性对许多应用程序至关重要。与传统的直接使用低级特征测量音频相似度的方法不同，本文考虑使用锚点空间进行相似度计算。这样一个空间的每一个维度对应于一个语义范畴(锚)。将音频剪辑映射到这个空间会得到一个向量，它表示该音频剪辑相对于每个语义类别的隶属性概率。两个音频片段的映射越相似，它们就越相似。虽然锚点空间通常以监督方式生成，但在音频内容语义过于多样化或先验未知的许多现实场景中，监督方法是不可行的。因此，我们提出一种无监督的方法来生成锚点空间。其中，利用谱聚类对具有相似底层特征的音频片段进行聚类，得到的聚类作为语义类别。与传统的基于低级特征的方法相比，在音频检索系统中使用该语义空间进行音频相似度计算显示出相当大的准确性提高(在mAP上为7%)。

{"title":"Unsupervised anchor space generation for similarity measurement of general audio","authors":"Lie Lu, A. Hanjalic","doi":"10.1109/ICASSP.2008.4517544","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4517544","url":null,"abstract":"Reliably measuring similarity between audio clips is critical to many applications. As opposed to the conventional way of measuring audio similarity using low-level features directly, in this paper we consider the similarity computation using an anchor space. Each dimension of such a space corresponds to a semantic category (anchor). Mapping an audio clip onto this space results in a vector, which indicates the membership probability of this audio clip with respect to each semantic category. The more similar the mappings of two audio clips, the more similar they are. While an anchor space is typically generated in a supervised fashion, supervised approach is infeasible in many realistic scenarios where audio content semantics is too diverse or simply unknown a priori. We therefore propose an unsupervised approach to anchor space generation. There, spectral clustering is employed to cluster the audio clips with similar low-level features and then the obtained clusters are adopted as semantic categories. Using this semantic space for audio similarity computation shows a considerable accuracy improvement (7% on mAP) in an audio retrieval system, compared with the conventional low-level feature based approach.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128543856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Maximum likelihood approach to speech enhancement for noisy reverberant signals 噪声混响信号语音增强的最大似然方法

2008 IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 2008-05-12 DOI: 10.1109/ICASSP.2008.4518677

Takuya Yoshioka, T. Nakatani, T. Hikichi, M. Miyoshi

This paper proposes a speech enhancement method for signals contaminated by room reverberation and additive background noise. The following conditions are assumed: (1) The spectral components of speech and noise are statistically independent Gaussian random variables. (2) The convolutive distortion channel is modeled as an auto-regressive system in each frequency bin. (3) The power spectral density of speech is modeled as an all-pole spectrum, while that of noise is assumed to be stationary and given in advance. Under these conditions, the proposed method estimates the parameters of the channel and those of the all-pole speech model based on the maximum likelihood estimation method. Experimental results showed that the proposed method successfully suppressed the reverberation and additive noise from three-second noisy reverberant signals when the reverberation time was 0.5 seconds and the reverberant signal to noise ratio was 10 dB.

提出了一种针对室内混响和附加背景噪声污染的语音增强方法。假设以下条件:(1)语音和噪声的频谱分量是统计独立的高斯随机变量。(2)将卷积失真信道建模为每个频域的自回归系统。(3)将语音的功率谱密度建模为全极谱，而假设噪声的功率谱密度是平稳的，并预先给出。在这种情况下，该方法基于极大似然估计方法对信道参数和全极语音模型参数进行估计。实验结果表明，当混响时间为0.5 s，混响信噪比为10 dB时，所提出的方法能够有效地抑制3秒噪声混响信号的混响和加性噪声。

引用次数: 7

Image spam hunter 垃圾图片搜寻者

2008 IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 2008-05-12 DOI: 10.1109/ICASSP.2008.4517972

Yan Gao, Ming Yang, Xiaonan Zhao, Bryan Pardo, Ying Wu, T. Pappas, A. Choudhary

Spammers are constantly creating sophisticated new weapons in their arms race with anti-spam technology, the latest of which is image-based spam. The newest image-based spam uses simple image processing technologies to vary the content of individual messages, e.g. by changing foreground colors, backgrounds, font types, or even rotating and adding artifacts to the images. Thus, they pose great challenges to conventional spam filters. In this paper, we propose a system using a probabilistic boosting tree to determine whether an incoming image is a spam or not based on global image features, i.e. color and gradient orientation histograms. The system identifies spam without the need for OCR and is robust in the face of the kinds of variation found in current spam images. Evaluation results show the system correctly classifies 90% of spam images while mislabeling only 0.86% of non-spam images as spam.

垃圾邮件发送者在反垃圾邮件技术的军备竞赛中不断创造复杂的新武器，其中最新的是基于图像的垃圾邮件。最新的基于图像的垃圾邮件使用简单的图像处理技术来改变单个消息的内容，例如，通过改变前景色，背景，字体类型，甚至旋转和添加图像的人工制品。因此，它们对传统的垃圾邮件过滤器提出了巨大的挑战。在本文中，我们提出了一个基于全局图像特征(即颜色和梯度方向直方图)的系统，该系统使用概率增强树来确定传入图像是否是垃圾图像。该系统在不需要OCR的情况下识别垃圾邮件，并且面对当前垃圾邮件图像中发现的各种变化具有鲁棒性。评估结果表明，系统正确分类了90%的垃圾图片，而将非垃圾图片错误标记为垃圾图片的只有0.86%。

引用次数: 70

Towards analytical convergence analysis of proportionate-type nlms algorithms 比例型nlms算法的解析收敛性分析

2008 IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 2008-05-12 DOI: 10.1109/ICASSP.2008.4518487

K. Wagner, M. Doroslovački

To date no theoretical results have been developed to predict the performance of the proportionate normalized least mean square (PNLMS) algorithm or any of its cousin algorithms such as the mu-law PNLMS (MPNLMS), and the e-law PNLMS (EPNLMS). In this paper we develop an analytic approach to predicting the performance of the simplified PNLMS algorithm which is closely related to the PNLMS algorithm. In particular we demonstrate the ability to predict the mean square output error of the simplified PNLMS algorithm using our theory.

到目前为止，还没有理论结果来预测比例归一化最小均方(PNLMS)算法或其任何类似算法，如mu-law PNLMS (MPNLMS)和e-law PNLMS (EPNLMS)的性能。本文提出了一种与PNLMS算法密切相关的简化PNLMS算法性能预测的解析方法。特别是，我们证明了使用我们的理论预测简化PNLMS算法的均方输出误差的能力。

引用次数: 29

Relation between joint optimizations for multiuser MIMO uplink and downlink with imperfect CSI 不完全CSI下多用户MIMO上下行联合优化的关系

2008 IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 2008-05-12 DOI: 10.1109/ICASSP.2008.4518318

M. Ding, S. Blostein

Joint linear minimum sum mean-squared error (referred to as MSMSE) transmitter and receiver (transceiver) optimization problems are formulated for multiuser MIMO systems under a sum power constraint assuming imperfect channel state information (CSI). Both the uplink and the dual downlink are considered. Based on the Karush-Kuhn-Tucker (KKT) conditions associated with both problems, a relation between the two problems is discovered, which is termed the uplink-downlink duality in sum MSE under imperfect CSI. As a result, the MSMSEs in both links are the same and any admissible uplink design satisfying the KKT conditions can be translated for application to the downlink, and vice versa. Simulation results are provided to demonstrate the duality and show the impact of imperfect CSI.

针对多用户MIMO系统，在功率和约束条件下，假设信道状态信息不完全，提出了联合线性最小均方误差和(MSMSE)发送端和接收端(收发端)优化问题。上行链路和双下行链路都要考虑。基于与这两个问题相关的Karush-Kuhn-Tucker (KKT)条件，发现了这两个问题之间的关系，称为不完全CSI下和MSE的上行-下行对偶性。因此，两个链路上的msmse是相同的，任何满足KKT条件的上行链路设计都可以转换为下行链路，反之亦然。仿真结果证明了这种对偶性，并展示了不完美CSI的影响。

引用次数: 31

Gaussian Mixture Kalman predictive coding of LSFS LSFS的高斯混合卡尔曼预测编码

2008 IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 2008-05-12 DOI: 10.1109/ICASSP.2008.4518725

Shaminda Subasingha, M. Murthi, S. Andersen

Gaussian mixture model (GMM)-based predictive coding of line spectral frequencies (LSFs) has gained wide acceptance. In such coders, each mixture of a GMM can be interpreted as defining a linear predictive transform coder. In this paper we optimize each of these linear predictive transform coders using Kalman predictive coding techniques to present GMM Kalman predictive coding. In particular, we show how suitable modeling of quantization noise leads to an adaptive a-posteriori GMM that defines a signal-adaptive predictive coder that provides superior coding of LSFs in comparison with the baseline GMM predictive coder. Moreover, we show how running the Kalman predictive coders to convergence can be used to design a stationary predictive coding system which again provides superior coding of LSFs but now with no increase in run-time complexity over the baseline.

基于高斯混合模型(GMM)的线谱频率预测编码已经得到了广泛的认可。在这样的编码器中，每个GMM的混合可以被解释为定义一个线性预测变换编码器。本文利用卡尔曼预测编码技术对这些线性预测变换编码器进行了优化，提出了GMM卡尔曼预测编码。特别是，我们展示了量化噪声的适当建模如何导致自适应后置GMM，该GMM定义了一个信号自适应预测编码器，与基线GMM预测编码器相比，该编码器提供了更好的lsf编码。此外，我们展示了如何运行卡尔曼预测编码器的收敛可以用来设计一个平稳的预测编码系统，该系统再次提供了优越的lsf编码，但现在没有增加运行时的复杂性。

引用次数: 4

Speaker normalization based on subglottal resonances 基于声门下共振的说话人归一化

2008 IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 2008-05-12 DOI: 10.1109/ICASSP.2008.4518600

Shizhen Wang, A. Alwan, Steven M. Lulich

Speaker normalization typically focuses on variabilities of the supra-glottal (vocal tract) resonances, which constitute a major cause of spectral mismatch. Recent studies show that the subglottal airways also affect spectral properties of speech sounds. This paper presents a speaker normalization method based on estimating the second and third subglottal resonances. Since the subglottal airways do not change for a specific speaker, the subglottal resonances are independent of the sound type (i.e., vowel, consonant, etc.) and remain constant for a given speaker. This context-free property makes the proposed method suitable for limited data speaker adaptation. This method is computationally more efficient than maximum-likelihood based VTLN, with performance better than VTLN especially for limited adaptation data. Experimental results confirm that this method performs well in a variety of testing conditions and tasks.

说话人归一化通常关注声门上(声道)共振的可变性，这是频谱失配的主要原因。最近的研究表明，声门下气道也影响语音的频谱特性。本文提出了一种基于声门下第二共振和第三共振估计的说话人归一化方法。由于特定说话者的声门下气道不会改变，因此声门下共振与声音类型(即元音，辅音等)无关，并且对给定的说话者保持不变。这种与上下文无关的特性使得该方法适用于有限数据说话人的自适应。该方法的计算效率高于基于最大似然的虚拟带库，特别是在有限的自适应数据下，性能优于虚拟带库。实验结果表明，该方法在各种测试条件和任务下都具有良好的性能。

引用次数: 18

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2008 IEEE International Conference on Acoustics, Speech and Signal Processing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀