2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)最新文献

英文中文

Speech quality objective assessment using neural network 基于神经网络的语音质量客观评价

2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)

Pub Date : 2000-06-05 DOI: 10.1109/ICASSP.2000.861932

Q. Fu, Kechu Yi, Mingui Sun

This paper presents a novel method for objective assessment of speech quality based on one-step strategy using a feedfoward neutral network. Currently, almost all the existing methods for this assessment can be regarded as a two-step strategy, requiring a distortion computation and a mapping from the average distortion value to the mean opinion score (MOS). Our new method combines these two steps by means of a neural network which can incorporate the perception properties of the human auditory system and provide an MOS estimate directly. Our theoretical analysis and experimental results suggest that this method of MOS estimate significantly overperforms the traditional methods. The correlation coefficient between the subjective test score and objective MOS estimate can reach up to about 0.95.

提出了一种基于前馈神经网络的一步策略客观评价语音质量的新方法。目前，几乎所有现有的评估方法都可以看作是一个两步策略，需要进行失真计算和从平均失真值到平均意见得分(MOS)的映射。我们的新方法通过神经网络将这两个步骤结合起来，该神经网络可以结合人类听觉系统的感知特性并直接提供MOS估计。理论分析和实验结果表明，该方法显著优于传统方法。主观测试成绩与客观MOS估计的相关系数可达0.95左右。

引用次数: 12

Projective residual vector quantization and mapped residual pooling 投影残差矢量量化和映射残差池化

2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)

Pub Date : 2000-06-05 DOI: 10.1109/ICASSP.2000.859197

Ryan P. Thomas, T. Moon

This paper points out two potential problems with residual vector quantization (RVQ): tree entanglement and non-projectiveness of the quantizer. The use of a boundary normalization mapping is proposed to pool all quantization residuals at a stage into identically-shaped regions, reducing or eliminating entanglement. Also, a reconstruction codebook is proposed to eliminate the non-projectiveness is proposed. Results are presented on both random and image data.

指出了残差矢量量化(RVQ)存在的两个潜在问题:树纠缠和量化器的非投射性。提出了使用边界归一化映射将某一阶段的所有量化残差汇集到相同形状的区域中，从而减少或消除纠缠。同时，提出了一种重构码本来消除非投影性。在随机数据和图像数据上给出了结果。

引用次数: 1

Implementing a high accuracy speaker-independent continuous speech recognizer on a fixed-point DSP 在定点DSP上实现高精度独立于说话人的连续语音识别器

2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)

Pub Date : 2000-06-05 DOI: 10.1109/ICASSP.2000.860202

Y. Gong, Yu-Hung Kao

Continuous speech recognition is a resource-intensive algorithm. Commercial dictation software requires more than 10 Mbytes to install on the disk and 32 Mbytes RAM to run the application. A typical embedded system can not afford this much RAM because of its high cost and power consumption; it also lacks disk to store the large amount of static data (e.g. acoustic models). We have been working on optimization of a small vocabulary speech recognizer suitable for implementation on a 16-bit fixed-point DSP. This recognizer supports sophisticated continuous density, tied-mixtures Gaussians, parallel model combination, and a noise-robust utterance detection algorithm. The fixed-point version achieves the same performance as the floating-point version. The algorithm runs real-time on a 100 MHz, 16-bit, fixed-point Texas Instruments TMS320C5410 even for the most challenging continuous digit dialing with hands-free microphone in driving conditions.

连续语音识别是一种资源密集型算法。商用听写软件需要超过10mb的磁盘安装和32mb的RAM来运行应用程序。一个典型的嵌入式系统无法负担这么多的内存，因为它的高成本和功耗;它也没有磁盘来存储大量的静态数据(例如声学模型)。我们一直致力于优化一个适合在16位定点DSP上实现的小词汇语音识别器。该识别器支持复杂的连续密度，捆绑混合高斯，并行模型组合和噪声鲁棒的话语检测算法。定点版本与浮点版本的性能相同。该算法在100 MHz, 16位定点德州仪器TMS320C5410上实时运行，即使在驾驶条件下使用免提麦克风进行最具挑战性的连续数字拨号。

引用次数: 22

Concatenating syllables for response generation in spoken language applications 在口语应用程序中响应生成的串接音节

2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)

Pub Date : 2000-06-05 DOI: 10.1109/ICASSP.2000.859114

T. Fung, H. Meng

We describe our approach in developing a speech synthesis technique for response generation in domain-specific spoken language applications. Our approach handles two Chinese dialects-Cantonese and Putonghua. We chose the foreign exchange domain, and worked with its constrained vocabulary and response expressions. The syllable is selected to be our basic unit for concatenation. Each unit label includes a two-digit appendix to encode the distinctive features of the left and right coarticulatory context. Our approach attempts to maximize intelligibility and naturalness of the responses within the application domain. Hence the synthesized outputs compare favorably with a domain-independent TD-PSOLA synthesizer.

我们描述了在特定领域的口语应用中开发用于响应生成的语音合成技术的方法。我们的方法处理两种中国方言-广东话和普通话。我们选择了外汇领域，并使用其受限的词汇和响应表达式。这个音节被选作我们串联的基本单位。每个单元标签包括一个两位数的附录，以编码左右协同发音上下文的独特特征。我们的方法试图在应用领域内最大限度地提高响应的可理解性和自然性。因此，合成的输出比较有利的领域独立的TD-PSOLA合成器。

引用次数: 8

Soft GPD for minimum classification error rate training 用于最小分类错误率训练的软GPD

2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)

Pub Date : 2000-06-05 DOI: 10.1109/ICASSP.2000.861803

Bertram E. Shi, K. Yao, Z. Cao

Minimum classification error (MCE) rate training is a discriminative training method which seeks to minimize an empirical estimate of the error probability derived over a training set. The segmental generalized probabilistic descent (GPD) algorithm for MCE uses the log likelihood of the best path as a discriminant function to estimate the error probability. This paper shows that by using a discriminant function similar to the auxiliary function used in EM, we can obtain a "soft" version of GPD in the sense that information about all possible paths is retained. Complexity is similar to segmental GPD. For certain parameter values, the algorithm is equivalent to segmental GPD. By modifying the misclassification measure usually used, we can obtain an algorithm for embedded MCE training for continuous speech which does not require a separate N-best search to determine competing classes. Experimental results show error rate reduction of 20% compared with maximum likelihood training.

最小分类错误率(MCE)训练是一种判别训练方法，它寻求最小化在训练集上得出的错误概率的经验估计。MCE的分段广义概率下降(GPD)算法使用最优路径的对数似然作为判别函数来估计误差概率。本文表明，通过使用类似于EM中使用的辅助函数的判别函数，我们可以获得一个“软”版本的GPD，即保留了所有可能路径的信息。复杂性类似于分段GPD。对于某些参数值，该算法相当于分段GPD。通过修改通常使用的误分类度量，我们可以得到一种不需要单独的n -最优搜索来确定竞争类的连续语音嵌入式MCE训练算法。实验结果表明，与最大似然训练相比，错误率降低了20%。

引用次数: 1

Infinitely divisible cascade analysis of network traffic data 网络流量数据的无限可分级联分析

2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)

Pub Date : 2000-06-05 DOI: 10.1109/ICASSP.2000.861931

D. Veitch, P. Abry, P. Flandrin, P. Chainais

Infinitely divisible cascades are a model class previously introduced in the field of turbulence to describe the statistics of velocity fields. In this paper, using a wavelet reformulation of the cascades, we investigate their ability to analyze band model scaling properties of data and compare their fundamental ingredients to those of other scaling model classes such as self-similar and multifractal processes. We also propose an estimation procedure for the propagator or kernel of the cascades. Finally the cascade model is successfully applied to describe Internet TCP network traffic data, bringing new insights into their scaling properties and revealing a pitfall in existing techniques.

无限可分级联是湍流领域中用来描述速度场统计特性的一类模型。在本文中，我们使用级联的小波重构，研究了它们分析数据带模型标度特性的能力，并将它们的基本成分与其他标度模型类(如自相似和多重分形过程)的基本成分进行了比较。我们还提出了级联的传播子或核的估计方法。最后，将级联模型成功地应用于描述Internet TCP网络流量数据，对其缩放特性有了新的认识，并揭示了现有技术中的一个缺陷。

引用次数: 44

Exploring permutation inconsistency in blind separation of speech signals in a reverberant environment 混响环境下语音信号盲分离中的排列不一致性研究

2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)

Pub Date : 2000-06-05 DOI: 10.1109/ICASSP.2000.859141

M. Ikram, D. Morgan

We study and explore the limitations of methods for blind separation of a mixture of multiple speakers in a real reverberant environment. To support our results, we analyze a frequency-domain method, which achieves blind source separation (BSS) by transforming the time-domain convolutive problem to multiple short-term problems in the frequency domain. We show that treating the problem independently at different frequency bins introduces a "permutation inconsistency" problem, which becomes worse as the length of room impulse response increases. Our studies prove that the ideas proposed in the existing literature are not capable of effectively handling this problem and a need exists for its satisfactory solution. We speculate that time-domain BSS techniques may also suffer from an equivalent permutation inconsistency problem when long un-mixing filters are used.

我们研究和探讨了在真实混响环境中盲分离多扬声器混合的方法的局限性。为了支持我们的结果，我们分析了一种频域方法，该方法通过将时域卷积问题转换为频域的多个短期问题来实现盲源分离(BSS)。我们表明，在不同的频率箱中单独处理这个问题会引入“排列不一致”问题，随着房间脉冲响应长度的增加，这个问题会变得更糟。我们的研究证明，现有文献中提出的观点并不能有效地处理这一问题，需要对其进行满意的解决。我们推测，当使用长时间的非混合滤波器时，时域BSS技术也可能遭受相同的排列不一致问题。

引用次数: 88

Reconstruction of chaotic dynamics using a noise-robust embedding method 用噪声鲁棒嵌入方法重建混沌动力学

2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)

Pub Date : 2000-06-05 DOI: 10.1109/ICASSP.2000.861907

W. Yoshida, S. Ishii, Masa-aki Sato

In this article, we discuss the reconstruction of chaotic dynamics in a partial observation situation. As a function approximator, we employ a normalized Gaussian network (NGnet), which is trained by an on-line EM algorithm. In order to deal with the partial observation, we propose a new embedding method based on smoothing filters, which is called integral embedding. The NGnet is trained to learn the dynamical system in the integral coordinate space. Experimental results show that the trained NGnet is able to reproduce a chaotic attractor that well approximates the complexity and instability of the original chaotic attractor, even when the data involve large noise. In comparison with our previous method using delay coordinate embedding, this new method is more robust to noise and faster in learning.

本文讨论了局部观测情况下混沌动力学的重建问题。作为函数逼近器，我们使用归一化高斯网络(NGnet)，该网络由在线EM算法训练。为了处理局部观测，我们提出了一种新的基于平滑滤波的嵌入方法，称为积分嵌入。训练NGnet学习积分坐标空间中的动力系统。实验结果表明，训练后的NGnet能够很好地再现混沌吸引子的复杂性和不稳定性，即使数据包含较大的噪声。与之前使用延迟坐标嵌入的方法相比，该方法对噪声的鲁棒性更强，学习速度更快。

引用次数: 6

An oblivious robust digital watermark technique for still images using DCT phase modulation 基于DCT相位调制的静止图像无关联鲁棒数字水印技术

2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)

Pub Date : 2000-06-05 DOI: 10.1109/ICASSP.2000.859218

Faisal Alturki, R. Mersereau

Digital watermarking is the process of secretly embedding a short sequence of information inside a digital source without changing its perceptual quality. We present a new oblivious digital watermarking method for copyright protection of still images. The technique is based on modifying the sign of a subset of low frequency DCT magnitude coefficients. The robustness to a number of standard image processing attacks is demonstrated using the criteria of the latest Stirmark test.

数字水印是在不改变数字源的感知质量的情况下，在数字源内部秘密嵌入短序列信息的过程。提出了一种新的用于静态图像版权保护的遗忘数字水印方法。该技术基于修改低频DCT幅度系数子集的符号。使用最新的Stirmark测试标准证明了对许多标准图像处理攻击的鲁棒性。

引用次数: 16

Bias of feedback cancellation algorithms based on direct closed loop identification 基于直接闭环辨识的反馈对消算法

2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)

Pub Date : 2000-06-05 DOI: 10.1109/ICASSP.2000.859098

J. Hellgren, U. Forssell

An adaptive filter can be used to cancel the undesired acoustic feedback of hearing aids. The adaptive algorithm studied in this paper uses the output and input signal of the hearing aid to continuously track the acoustic feedback path. The bias of the optimal estimate with a quadratic norm is analyzed. The results show the importance of having a good model of the input signal to the hearing aid, as the error in this model will introduce bias in the estimate of the feedback path.

自适应滤波器可用于消除助听器的不良声反馈。本文研究的自适应算法利用助听器的输出和输入信号对声反馈路径进行连续跟踪。分析了二次范数下最优估计的偏差。结果表明，良好的输入信号模型对助听器的重要性，因为该模型的误差会在反馈路径的估计中引入偏差。

引用次数: 7

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀