Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing最新文献

英文中文

A new cumulant based parameter estimation method for noncausal autoregressive systems 一种基于累积量的非因果自回归系统参数估计新方法

Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 1994-09-01 DOI: 10.1109/ICASSP.1994.389877

Chong-Yung Chi, J. Hwang, C. Rau

In this paper, a new nonlinear parameter estimation method for a noncausal autoregressive (AR) system based on a new quadratic equation relating the unknown AR parameters to higher-order (/spl ges/3) cumulants of non-Gaussian output measurements in the presence of additive Gaussian noise, is described. It is applicable no matter whether or not the order of the system is known in advance; it is also applicable for the case of causal AR system. Some simulation results are offered to justify that the proposed method is effective.<>

本文基于未知AR参数与加性高斯噪声下非高斯输出测量的高阶(/spl /3)累积量的二次方程，提出了一种非因果自回归(AR)系统的非线性参数估计方法。无论系统序次是否事先已知，均适用;它也适用于因果AR系统的情况。仿真结果证明了该方法的有效性。

引用次数: 11

Using Gaussian mixture modeling in speech recognition 高斯混合建模在语音识别中的应用

Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 1994-05-26 DOI: 10.1109/ICASSP.1994.389219

Yaxin Zhang, M. Alder, R. Togneri

The paper describes a speaker-independent isolated word recognition system which uses a well known technique, the combination of vector quantization with hidden Markov modeling. The conventional vector quantization algorithm is substituted by a statistical clustering algorithm, the expectation-maximization algorithm, in this system. Based on the investigation of the data space, the phonemes were manually extracted from the training data and were used to generate the Gaussians in a code book in which each code word is a Gaussian rather than a centroid vector of the data class. Word-based hidden Markov modeling was then performed. Two English isolated digits data bases were investigated and the 12 Mel-spaced filter bank coefficients employed as the input feature. Compared with the conventional discrete HMM, the present system obtained a significant improvement of recognition accuracy.<>

本文描述了一种独立于说话人的孤立词识别系统，该系统采用了一种著名的技术，即向量量化与隐马尔可夫建模相结合。该系统将传统的矢量量化算法替换为一种统计聚类算法，即期望最大化算法。基于对数据空间的研究，从训练数据中手动提取音素并用于生成代码本中的高斯分布，其中每个码字都是高斯分布，而不是数据类的质心向量。然后进行基于词的隐马尔可夫建模。研究了两个英文孤立数字数据库，并采用12个mel间隔的滤波器组系数作为输入特征。与传统的离散HMM相比，该系统的识别精度得到了显著提高。

引用次数: 20

Generalized magnitude and power complementary filters 广义幅度和功率互补滤波器

Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 1994-04-19 DOI: 10.1109/ICASSP.1994.389959

S. R. Pillai, G. H. Allen

A new higher order generalization of magnitude and power complementary filters is proposed in this paper. The proposed scheme is shown to have superior frequency characteristics compared to the ordinary complementary filters. Applications of these generalized complementary filters include subband coding for audio and video, and sharpening of amplitude characteristics of digital filters. Interestingly, this new design procedure can be used to generate ordinary multichannel magnitude and power complementary filters with sharper band responses.<>

提出了一种新的幅度和功率互补滤波器的高阶推广方法。与普通互补滤波器相比，该方案具有优越的频率特性。这些广义互补滤波器的应用包括音频和视频的子带编码，以及数字滤波器的幅度特性锐化。有趣的是，这种新的设计程序可用于生成具有更清晰频带响应的普通多通道幅度和功率互补滤波器。

引用次数: 7

Segmental phoneme recognition using piecewise linear regression 分段线性回归的分段音素识别

Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 1994-04-19 DOI: 10.1109/ICASSP.1994.389358

S. Krishnan, P. Rao

We propose an efficient, self-organizing segmental measurement based on piecewise linear regression (PLR) fit of the short-term measurement trajectories. The advantages of this description are: (i) it serves to decouple temporal measurements from the recognition strategy; and, (ii) it leads to lesser computation as compared with conventional methods. Also, acoustic context can be easily integrated into this framework. The PLR measurements are cast into a stochastic segmental framework for phoneme classification. We show that this requires static classifiers for each regression component. Finally, we evaluate this approach on the phoneme recognition task. Using the TIMIT database. This shows that the PLR description leads to a computationally simple alternative to existing approaches.<>

我们提出了一种基于短期测量轨迹分段线性回归(PLR)拟合的高效自组织分段测量方法。这种描述的优点是:(i)它可以将时间测量从识别策略中解耦;并且，(ii)与传统方法相比，它可以减少计算量。此外，声学环境可以很容易地集成到这个框架中。PLR测量值被转换成一个用于音素分类的随机分段框架。我们表明，这需要每个回归组件的静态分类器。最后，我们在音素识别任务中对该方法进行了评估。使用TIMIT数据库。这表明，PLR描述为现有方法提供了一种计算简单的替代方法。

引用次数: 9

Detection of nonstationary random signals in colored noise 有色噪声中非平稳随机信号的检测

Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 1994-04-19 DOI: 10.1109/ICASSP.1994.389738

W. Padgett, Douglas B. Williams

This paper describes a novel method for detecting nonstationary signals in colored noise. A first order complex autoregressive, or AR(1), signal model is used which restricts the application of the detector to low order signals, i.e., those which are well modeled by a low order AR process and have only a single spectral peak. The detector assumes the noise covariance is stationary and known. The likelihood function is estimated in the frequency domain because the model simplifies, and the nonstationary frequency estimate can be obtained by an algorithm which approximates the Viterbi algorithm. The AR model parameters are then used to form the appropriate covariance matrix and the approximate likelihood is calculated. Therefore, the detector uses efficient approximations to approximate the generalized likelihood ratio test (GLRT). Simulation results are shown to compare the detector with the known signal likelihood ratio test.<>

本文提出了一种在有色噪声中检测非平稳信号的新方法。使用一阶复杂自回归(AR(1))信号模型，该模型将检测器的应用限制在低阶信号，即那些由低阶AR过程很好地建模并且只有单个谱峰的信号。检测器假设噪声协方差是平稳且已知的。由于模型简化，在频域对似然函数进行估计，采用近似Viterbi算法得到非平稳频率估计。然后利用AR模型参数组成相应的协方差矩阵，计算近似似然。因此，检测器使用有效的近似来近似广义似然比检验(GLRT)。仿真结果与已知的信号似然比检验进行了比较。

引用次数: 1

A frequency domain filtering method for generation of long complex Gaussian sequences with required spectra 一种用于生成具有所需谱的长复高斯序列的频域滤波方法

Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 1994-04-19 DOI: 10.1109/ICASSP.1994.389804

Weimin Zhang

The frequency domain filtering method treats the i.i.d. Gaussian samples as in the frequency domain and thus the generation of out-band samples is not necessary. It is efficient when the Doppler bandwidth is low in comparison with the sampling rate, such as in simulations of multipath fading channels. A proposed time domain smooth joining scheme maintains the mean and variance unchanged and controls the power spectral density distortion. The data window used in the joining sessions is cosine and the resultant autocorrelation window ranges from triangle to Papoulis, depending on the degree of overlapping. The spectrum distortions are observed as a trade-off with computational efficiency. It can be seen as a reciprocal work of power spectrum estimation.<>

频域滤波方法将i.i.d高斯样本视为频域样本，因此不需要产生带外样本。当多普勒带宽相对于采样率较低时，如多径衰落信道的仿真，该方法是有效的。提出了一种时域平滑连接方法，该方法保持了均值和方差不变，并控制了功率谱密度失真。在连接会话中使用的数据窗口是余弦，由此产生的自相关窗口范围从三角形到Papoulis，取决于重叠的程度。频谱畸变是与计算效率相权衡的结果。它可以看作是功率谱估计的倒数工作。

引用次数: 4

Speech enhancement based on a new set of auditory constrained parameters 基于一组新的听觉约束参数的语音增强

Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 1994-04-19 DOI: 10.1109/ICASSP.1994.389370

S. Nandkumar, J. Hansen

A speech enhancement technique is proposed, where auditory based properties of perception are investigated for the purpose of robust speech characterization and improved speech quality in additive background noise. Constraints based on a novel auditory spectral representation are developed in a dual-channel iterative Wiener filtering framework. The spectral representation model aspects of audition include critical band filtering, intensity to loudness conversion, and lateral inhibition. The auditory transformations and perceptual constraints are shown to result in an improved set of auditory constrained and enhanced linear prediction (ACE-LP) parameters. Objective measures and informal listening tests show improved speech quality for both white Gaussian and colored noise cases. The consistency of speech quality improvement is illustrated over time and across all phonemes from a set of phonetically labeled TIMIT database sentences.<>

提出了一种语音增强技术，其中研究了基于听觉的感知特性，目的是在加性背景噪声中实现鲁棒语音表征和提高语音质量。在双通道迭代维纳滤波框架中提出了一种新的听觉谱表示约束。听力的频谱表示模型方面包括关键波段滤波，强度到响度的转换，和横向抑制。听觉转换和感知约束导致听觉约束和增强线性预测(ACE-LP)参数集的改进。客观测量和非正式听力测试表明，在高斯白噪声和有色噪声情况下，语音质量都有所提高。随着时间的推移，语音质量改善的一致性在一组语音标记的TIMIT数据库句子的所有音素中得到了体现。b>

引用次数: 11

3-D image coding based on affine transform 基于仿射变换的三维图像编码

Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 1994-04-19 DOI: 10.1109/ICASSP.1994.389446

T. Fujii, H. Harashima

This paper is concerned with the data compression and interpolation of multi-view images. We propose the affine-based disparity compensation based on a geometric relationship. We first investigate the geometric relationship between the point in object space and its projection onto a view image. Then, we propose the disparity compensation based on the affine transform, which utilize the geometric constraints between view images. In this scheme, multi-view images are compressed into the structure and texture of the triangular patches. This scheme not only compresses the multi-view image but also synthesize the view images from any viewpoints in the viewing zone, because the geometric relationship is taken into account. Finally, we report an experiment, where 19 view images were used as the original multi-view image and the amount of data was reduced to 1/19 with an SNR of 34 dB.<>

本文主要研究多视点图像的数据压缩与插值问题。提出了一种基于几何关系的仿射视差补偿方法。我们首先研究了物体空间中的点与其在视图上的投影之间的几何关系。然后，我们提出了基于仿射变换的视差补偿方法，该方法利用了视差之间的几何约束。在该方案中，多视图图像被压缩成三角形块的结构和纹理。由于考虑了几何关系，该方案不仅可以压缩多视点图像，而且可以综合视点区域内任意视点的图像。最后，我们报告了一个实验，使用19幅视图图像作为原始多视图图像，将数据量减少到1/19，信噪比为34 dB。

引用次数: 3

On the importance of the microphone position for speech recognition in the car 论车载语音识别中麦克风位置的重要性

Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 1994-04-19 DOI: 10.1109/ICASSP.1994.389264

J. Smolders, T. Claes, Gert Sablon, Dirk Van Compernolle

One of the problems with speech recognition in the car is the position of the far talk microphone. This position not only implies more or less noise, coming from the car (engine, tires,...) or from other sources (traffic, wind noise,...) but also a different acoustical transfer function. In order to compare the microphone positions in the car, we recorded a multispeaker database in a car with 7 different positions and compared them on the basis of SNR and recognition rate. The position at the ceiling right in front of the speaker gave the best results.<>

车内语音识别的一个问题是远程通话麦克风的位置。这个位置不仅意味着来自汽车(发动机、轮胎等)或其他来源(交通、风噪声等)的噪音或多或少，而且还意味着不同的声学传递函数。为了比较车内麦克风的位置，我们在一辆车中记录了7个不同位置的多扬声器数据库，并在信噪比和识别率的基础上进行了比较。扬声器正前方天花板处的位置效果最好。

引用次数: 19

Simultaneous 3-D motion estimation and wire-frame model adaptation including photometric effects for knowledge-based video coding 同时三维运动估计和线框模型适应，包括基于知识的视频编码的光度效应

Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing

Pub Date : 1994-04-19 DOI: 10.1109/ICASSP.1994.389400

G. Akar, A. Tekalp, L. Onural

We address the problem of 3-D motion estimation in the context of knowledge-based coding of facial image sequences. The proposed method handles the global and local motion estimation and the adaptation of a generic wire-frame to a particular speaker simultaneously within an optical flow based framework including the photometric effects of motion. We use a flexible wire-frame model whose local structure is characterized by the normal vectors of the patches which are related to the coordinates of the nodes. Geometrical constraints that describe the propagation of the movement of the nodes are introduced, which are then efficiently utilized to reduce the number of independent structure parameters. A stochastic relaxation algorithm has been used to determine optimum global motion estimates and the parameters describing the structure of the wire-frame model. For the initialization of the motion and structure parameters, a modified feature based algorithm is used. Experimental results with simulated facial image sequences are given.<>

在基于知识的人脸图像序列编码背景下，研究了三维运动估计问题。该方法在包含运动光度效应的基于光流的框架内同时处理全局和局部运动估计以及通用线框对特定说话者的自适应。我们使用一种柔性线框模型，其局部结构由与节点坐标相关的补丁的法向量来表征。引入描述节点运动传播的几何约束，然后有效地利用这些约束来减少独立结构参数的数量。采用随机松弛算法确定最优全局运动估计和描述线框模型结构的参数。对于运动和结构参数的初始化，采用了一种改进的基于特征的算法。给出了模拟人脸图像序列的实验结果。

{"title":"Simultaneous 3-D motion estimation and wire-frame model adaptation including photometric effects for knowledge-based video coding","authors":"G. Akar, A. Tekalp, L. Onural","doi":"10.1109/ICASSP.1994.389400","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389400","url":null,"abstract":"We address the problem of 3-D motion estimation in the context of knowledge-based coding of facial image sequences. The proposed method handles the global and local motion estimation and the adaptation of a generic wire-frame to a particular speaker simultaneously within an optical flow based framework including the photometric effects of motion. We use a flexible wire-frame model whose local structure is characterized by the normal vectors of the patches which are related to the coordinates of the nodes. Geometrical constraints that describe the propagation of the movement of the nodes are introduced, which are then efficiently utilized to reduce the number of independent structure parameters. A stochastic relaxation algorithm has been used to determine optimum global motion estimates and the parameters describing the structure of the wire-frame model. For the initialization of the motion and structure parameters, a modified feature based algorithm is used. Experimental results with simulated facial image sequences are given.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121367897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀