Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing最新文献

英文中文

IPA: improved phone modelling with recurrent neural networks IPA:改进的手机模型与循环神经网络

Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 1994-04-19 DOI: 10.1109/ICASSP.1994.389361

T. Robinson, M. Hochberg, S. Renals

This paper describes phone modelling improvements to the hybrid connectionist-hidden Markov model speech recognition system developed at Cambridge University. These improvements are applied to phone recognition from the TIMIT task and word recognition from the Wall Street Journal (WSJ) task. A recurrent net is used to map acoustic vectors to posterior probabilities of phone classes. The maximum likelihood phone or word string is then extracted using Markov models. The paper describes three improvements: connectionist model merging; explicit presentation of acoustic context; and improved duration modelling. The first is shown to provide a significant improvement in the TIMIT phone recognition rate and all three provide an improvement in the WSJ word recognition rate.<>

本文描述了剑桥大学开发的混合连接主义者-隐藏马尔可夫模型语音识别系统的手机建模改进。这些改进应用于TIMIT任务中的电话识别和华尔街日报(WSJ)任务中的单词识别。使用循环网络将声向量映射到电话类的后验概率。然后使用马尔可夫模型提取最大似然电话或单词字符串。本文描述了三种改进:联结主义模型合并;声音语境的明确呈现;改进了持续时间模型。第一种方法可以显著提高TIMIT手机识别率，而这三种方法都可以提高WSJ的单词识别率。

引用次数: 46

Optimal entropy constrained scalar quantization for exponential and Laplacian random variables 指数和拉普拉斯随机变量的最优熵约束标量量化

Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 1994-04-19 DOI: 10.1109/ICASSP.1994.389481

G. Sullivan

This paper presents solutions to the entropy-constrained scalar quantizer (ECSQ) design problem for two sources commonly encountered in image and speech compression applications: sources having exponential and Laplacian probability density functions. We obtain the optimal ECSQ either with or without an additional constraint on the number of levels in the quantizer. In contrast to prior methods, which require iterative solution of a large number of nonlinear equations, the new method needs only a single sequence of solutions to one-dimensional nonlinear equations (in some Laplacian cases, one additional two-dimensional solution is needed). As a result, the new method is orders of magnitude faster than prior ones. We also show that as the constraint on the number of levels in the quantizer is relaxed, the optimal ECSQ becomes a uniform threshold quantizer (UTQ) for exponential, but not for Laplacian sources.<>

本文针对图像和语音压缩应用中常见的两种源:具有指数和拉普拉斯概率密度函数的源，提出了熵约束标量量化器(ECSQ)设计问题的解决方案。我们获得了最优的ECSQ，无论是否对量化器中的电平数量进行了额外的约束。与以往需要迭代求解大量非线性方程的方法相比，新方法只需要一维非线性方程的单个解序列(在某些拉普拉斯情况下，还需要一个额外的二维解)。因此，新方法比以前的方法快了几个数量级。我们还表明，随着量化器中层数的约束放宽，最优ECSQ对于指数源成为均匀阈值量化器(UTQ)，而对于拉普拉斯源则不是。

引用次数: 5

Spectral quantization of cepstral coefficients 倒谱系数的谱量化

Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 1994-04-19 DOI: 10.1109/ICASSP.1994.389244

R. Hagen

Studies the cepstral coefficients as a suitable representation of the linear prediction filter for spectral coding purposes. Spectral coding methods in predictive speech coders are usually evaluated using the spectral distance measure. The average spectral distance combined with a measure of the percentage of spectra with high distortion are used to predict the perceptual quality when quantizing the prediction filter. The authors show that the spectral distance is equivalent to a squared error in the cepstral domain. Methods for spectral quantization using vector quantization of cepstral coefficients are analyzed. Better results than for quantization of line spectrum frequencies are reported for both single-stage VQ at 11-14 bits as well as 2-stage VQ at 18-22 bits. It is concluded that the cepstral coefficients are the right representation for LPC spectral coding purposes.<>

研究了倒谱系数作为谱编码目的的线性预测滤波器的合适表示。预测语音编码器中的频谱编码方法通常使用频谱距离度量来评估。在量化预测滤波器时，使用平均光谱距离和高失真光谱百分比来预测感知质量。结果表明，谱距相当于倒谱域的平方误差。分析了利用倒谱系数矢量量化实现谱量化的方法。在11-14位的单级VQ和18-22位的两级VQ中，都报道了比线谱频率量化更好的结果。结果表明，倒谱系数是LPC频谱编码的正确表示形式。

引用次数: 20

Segmentation of speech using speaker identification 使用说话人识别的语音分割

Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 1994-04-19 DOI: 10.1109/ICASSP.1994.389330

L. Wilcox, Francine R. Chen, Don Kimber, V. Balasubramanian

This paper describes techniques for segmentation of conversational speech based on speaker identity. Speaker segmentation is performed using Viterbi decoding on a hidden Markov model network consisting of interconnected speaker sub-networks. Speaker sub-networks are initialized using Baum-Welch training on data labeled by speaker, and are iteratively retrained based on the previous segmentation. If data labeled by speaker is not available, agglomerative clustering is used to approximately segment the conversational speech according to speaker prior to Baum-Welch training. The distance measure for the clustering is a likelihood ratio in which speakers are modeled by Gaussian distributions. The distance between merged segments is recomputed at each stage of the clustering, and a duration model is used to bias the likelihood ratio. Segmentation accuracy using agglomerative clustering initialization matches accuracy using initialization with speaker labeled data.<>

本文介绍了基于说话人身份的会话语音分割技术。在由相互连接的说话人子网络组成的隐马尔可夫模型网络上，使用维特比解码对说话人进行分割。对说话人标记的数据使用Baum-Welch训练初始化说话人子网络，并在之前分割的基础上迭代地重新训练。如果没有说话人标记的数据，在鲍姆-韦尔奇训练之前，使用聚类方法根据说话人对会话语音进行近似分割。聚类的距离度量是一个似然比，其中说话人由高斯分布建模。在聚类的每个阶段重新计算合并段之间的距离，并使用持续时间模型对似然比进行偏置。使用聚类初始化的分割精度与使用说话人标记数据初始化的精度相匹配。

引用次数: 96

A novel tree-structured video coder 一种新颖的树状结构视频编码器

Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 1994-04-19 DOI: 10.1109/ICASSP.1994.389465

F. D. Natale, G. Desoli, D. Giusto

A novel approach to video coding at very low bit rates is presented, which differs significantly from most of previous approaches, as it uses a spline-like interpolation scheme in a spatiotemporal domain. This operator is applied to a non-uniform 3D grid (built on sets of consecutive frames) so as to allocate the information adaptively. The proposed method allows a full exploitation of intra/inter-frame correlations and a good objective and visual quality of the reconstructed sequences.<>

提出了一种在极低比特率下进行视频编码的新方法，该方法与以前的大多数方法有很大的不同，因为它在时空域中使用样条插值方案。该算子应用于非均匀三维网格(建立在连续帧集上)，以自适应地分配信息。该方法可以充分利用帧内/帧间的相关性，并具有良好的重建序列的客观和视觉质量。

引用次数: 1

On the equivalence between Gamma and Laguerre filters 关于伽玛滤波器和拉盖尔滤波器的等价性

Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 1994-04-19 DOI: 10.1109/ICASSP.1994.389800

T. E. O. Silva

Proves the equivalence between the Gamma and Laguerre filters. Applying the optimal conditions for Gamma filters, which are easy to obtain, the author arrives at the optimal conditions for Laguerre filters. Curiously these conditions are the same as those of a truncated Laguerre series approximation, which corresponds to the usage of an impulse as the input of the Laguerre filter. The author illustrates these results with an example. The author also investigates the relative merits of both structures in an adaptive filter setup.<>

证明了伽玛滤波器和拉盖尔滤波器的等价性。利用容易得到的伽玛滤波器的最优条件，得到了拉盖尔滤波器的最优条件。奇怪的是，这些条件与截断拉盖尔级数近似的条件相同，这对应于使用脉冲作为拉盖尔滤波器的输入。作者用实例说明了这些结果。作者还研究了这两种结构在自适应滤波器设置中的相对优点。

引用次数: 12

Statistical analysis of the median based multi-shell order-statistics filters 基于中值的多壳序统计过滤器的统计分析

Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 1994-04-19 DOI: 10.1109/ICASSP.1994.389510

J. J. Li, A. Ramsingh

The multi-shell median filters have been shown to be effective in preserving image details as well as in the suppression of impulsive noise. In this paper, the statistical analysis of a general class of median based multi-shell order-statistics filters is presented. Using statistical threshold decomposition, together with a tri-tree structure, the statistical properties of the filters were derived. Based on the results, a 2-D nonlinear filter which is of good compromise between noise attenuation and detail preservation to fit various applications can be obtained.<>

多壳中值滤波器在保留图像细节和抑制脉冲噪声方面具有良好的效果。本文给出了一类基于中值的多壳序统计滤波器的统计分析。利用统计阈值分解，结合三树结构，导出了滤波器的统计特性。在此基础上，得到了一种兼顾噪声衰减和细节保留的二维非线性滤波器。

引用次数: 8

Spectrum reuse using transmitting antenna arrays with feedback 利用带反馈的发射天线阵列复用频谱

Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 1994-04-19 DOI: 10.1109/ICASSP.1994.389867

D. Gerlach, A. Paulraj

Currently, a central base station communicates simultaneously with several mobile users by allocating a separate time or frequency channel for each mobile to prevent undesired crosstalk. However, each time or frequency channel may be reused among several mobiles by means of an antenna array at the base station which points a separate beam at each user. The downlink beamformer would normally operate in an "open loop" mode, in which the base steers a mainlobe in the direction of each mobile. Such a system may operate effectively in a free space environment with no multipath. In the presence of scattering, open loop methods will not perform adequately. A new "closed loop" technique is presented in which each mobile user feeds back to the base estimates of the received signal amplitudes. Using feedback, the base station can achieve precision beamforming resulting in lower crosstalk and improved signal separation even in the presence of strong scattering environments.<>

目前，中央基站通过为每个移动设备分配单独的时间或频率信道来防止不希望的串扰，从而与多个移动用户同时通信。然而，每个时间或频率信道可以通过在基站上的天线阵列在几个移动设备之间重复使用，该天线阵列将一个单独的波束指向每个用户。下行波束形成器通常以“开环”模式工作，在这种模式下，基站将主瓣转向每个移动方向。这样的系统可以在没有多路径的自由空间环境中有效地运行。在存在散射的情况下，开环方法将不能充分发挥作用。提出了一种新的“闭环”技术，其中每个移动用户反馈到接收信号幅度的基本估计。利用反馈，基站可以实现精确的波束形成，从而降低串扰并改善信号分离，即使在存在强散射环境的情况下。

引用次数: 24

Channel equalization with perceptrons: an information-theoretic approach 用感知器实现信道均衡:一种信息论方法

Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 1994-04-19 DOI: 10.1109/ICASSP.1994.390039

T. Adalı, M. Sönmez

We formulate the adaptive channel equalization as a conditional probability distribution learning problem. Conditional probability density function of the transmitted signal given the received signal is parametrized by a sigmoidal perceptron. In this framework, we use relative entropy (Kullback-Leibler distance) between the true and the estimated distributions as the cost function to be minimized. The true probabilities are approximated by their stochastic estimators resulting in a stochastic relative entropy cost function. This function is well-formed in the sense of Wittner and Denker (1988), therefore gradient descent on this cost function is guaranteed to find a solution. The consistency and asymptotic normality of this learning scheme are shown via maximum partial likelihood estimation of logistic models. As a practical example, we demonstrate that the resulting algorithm successfully equalizes multipath channels.<>

我们将自适应信道均衡表述为一个条件概率分布学习问题。在给定接收信号的情况下，用s型感知器对发射信号的条件概率密度函数进行参数化。在这个框架中，我们使用真实分布和估计分布之间的相对熵(Kullback-Leibler距离)作为最小化的代价函数。真实概率由它们的随机估计量来逼近，从而得到一个随机的相对熵成本函数。这个函数在Wittner和Denker(1988)的意义上是良构的，因此在这个代价函数上的梯度下降保证找到一个解。通过logistic模型的极大偏似然估计，证明了该学习方案的一致性和渐近正态性。作为一个实际的例子，我们证明了所得到的算法成功地均衡了多径信道。

引用次数: 9

The voice across Japan database-the Japanese language contribution to Polyphone 横跨日本的声音数据库-日语对Polyphone的贡献

Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 1994-04-19 DOI: 10.1109/ICASSP.1994.389348

Thomas Staples, J. Picone, Nozomi Arai

Texas Instruments' Voice Across Japan (VAJ) database, modeled after the highly successful Voice Across America project, consists of a wide range of diverse speech material including digit strings, yes/no questions, and phonetically-rich read sentences. The data is being collected using long distance telephone lines and an analog telephone interface. The target size is 14 items per speaker by 10,000 speakers. Greater emphasis is being placed on the collection of phonetically-rich read sentence data. Four randomly selected sentences are included in each session: one from the 512 sentence ATR PB set, and three from a 10,000 sentence set developed specifically for this project. This latter sentence set, designed to maximize the triphone coverage of the database, is described. The VAJ database is planned to be included in the Linguistic Data Consortium's (LDC) Polyphone (multi-language) database.<>

德州仪器的Voice Across Japan (VAJ)数据库是仿造了非常成功的Voice Across America项目，由各种各样的语音材料组成，包括数字串、是/否问题和语音丰富的可读句子。数据是通过长途电话线和模拟电话接口收集的。目标尺寸是每10,000名演讲者14件物品。更大的重点放在收集语音丰富的阅读句子数据上。每个会话包含四个随机选择的句子:一个来自512个句子的ATR PB集，另外三个来自专门为这个项目开发的10,000个句子集。描述了后一种句子集，旨在最大限度地扩大数据库的三重电话覆盖范围。VAJ数据库计划被纳入语言数据联盟(LDC)的Polyphone(多语言)数据库。

引用次数: 10

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀