ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing最新文献

英文中文

Estimating the number of sinusoids in additive white-noise 加性白噪声中正弦波数的估计

ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing

Pub Date : 1987-04-06 DOI: 10.1109/ICASSP.1987.1169582

J. Fuchs

For a random process that can be modeled as a sum of real sinusoids in white noise, we address the problem of the estimation of the number of sinusoids. The test we propose uses the eigen-decomposition of the estimated autocorrelation matrix and is based on matrix perturbation analysis. The estimator is shown to resolve closely spaced sinusoids at quite low signal -to- noise ratios.

对于一个随机过程，可以建模为在白噪声中实数正弦波的和，我们解决了正弦波数的估计问题。我们提出的测试使用估计的自相关矩阵的特征分解，并基于矩阵摄动分析。该估计器被证明可以在相当低的信噪比下解决紧密间隔的正弦波。

引用次数: 78

A transform based covariance differencing approach to bearing estimation 一种基于变换协方差差分的方位估计方法

ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing

Pub Date : 1987-04-06 DOI: 10.1109/ICASSP.1987.1169850

S. Prasad, Ronald T. Williams, Arijit K. Mahalanabis, L. Sibul

In recent years a new, and very powerful technique for parameter estimation - the eigenstructure, or signal subspace method - has been developed. Eigenstructure algorithms are closely related to Pisarenko's method for estimating the frequencies of sinusoids in white Gaussian noise. In theory they yield asymptotically unbiased estimates of arbitrarily close parameters, independent of the signal-to-noise ratio (SNR). Although signal subspace methods have proven to be powerful tools, they are not without drawbacks. An important weakness of all signal subspace algorithmis their need to know the noise covariance explicitly. The important problem of developing signal subspace based procedures for signals in noise fields with unknown covariance has not been satisfactorily addressed. It is our intent to propose a solution to the problem of direction-of-arrival (DOA) estimation for a broad class of unknown noise fields. We will then briefly discuss other important estimation problems for which modified versions of this procedure can be applied.

近年来，一种新的、非常强大的参数估计技术——特征结构或信号子空间法被开发出来。特征结构算法与Pisarenko在高斯白噪声中估计正弦波频率的方法密切相关。理论上，它们产生任意接近参数的渐近无偏估计，与信噪比(SNR)无关。虽然信号子空间方法已被证明是一种强大的工具，但它们并非没有缺点。所有信号子空间算法的一个重要缺点是需要明确地知道噪声协方差。对于协方差未知的噪声域中的信号，开发基于信号子空间的处理方法这一重要问题还没有得到令人满意的解决。我们的目的是针对一类未知噪声场的到达方向估计问题提出一种解决方案。然后，我们将简要讨论其他重要的评估问题，这些问题可以应用此过程的修改版本。

引用次数: 11

Variable region vector quantization, space warping and speech/Image compression 可变区域矢量量化，空间翘曲和语音/图像压缩

ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing

Pub Date : 1987-04-06 DOI: 10.1109/ICASSP.1987.1169359

Y. Matsuyama

Algorithms for vector quantization of variable region data are given. The design iteration is proved to converge. An important issue here is the optimization step of the region shape with respect to the vector quantization codebook. Thus, the presented design method is a nontrivial extention of ordinary vector quantizer design which contains the classical Lloyd-Max algorithm. First, the main algorithm is given without introducing any physical entity. Therefore, the method is applicable to any data including speech and image as long as the quantization distortion is defined. In the speech coding case, which is the main body of this paper, the region shape optimization is interpreted as the epoch interval adjustment. The selection of the adjusted epochs with respect to the vector quantization codebook considerably reduces the quantizing distortion. This enables very-low-rate speech compression. Then, the image coding case is formulated and some convergence problem is discussed.

给出了变域数据的矢量量化算法。结果表明，设计迭代是收敛的。这里的一个重要问题是区域形状相对于矢量量化码本的优化步骤。因此，所提出的设计方法是普通矢量量化器设计的非平凡扩展，其中包含经典的Lloyd-Max算法。首先，在不引入任何物理实体的情况下给出了主要算法。因此，该方法适用于包括语音和图像在内的任何数据，只要定义了量化失真。在本文的主体语音编码中，将区域形状优化解释为epoch区间调整。相对于矢量量化码本的调整epoch的选择大大减少了量化失真。这可以实现非常低速率的语音压缩。然后，给出了图像编码的具体情况，并讨论了一些收敛问题。

引用次数: 7

Speech recognition with very large size dictionary 语音识别与非常大的大小字典

ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing

Pub Date : 1987-04-06 DOI: 10.1109/ICASSP.1987.1169731

B. Mérialdo

This paper proposes a new strategy, the Multi-Level Decoding (MLD), that allows to use a Very Large Size Dictionary (VLSD, size more than 100,000 words) in speech recognition. MLD proceeds in three steps:bulleta Syllable Match procedure uses an acoustic model to build a list of the most probable syllables that match the acoustic signal from a given time frame.bulletfrom this list, a Word Match procedure uses the dictionary to build partial word hypothesis.bulletthen a Sentence Match procedure uses a probabilistic language model to build partial sentence hypothesis until total sentences are found. An original matching algorithm is proposed for the Syllable Match procedure. This strategy is experimented on a dictation task of French texts. Two different dictionaries are tested,bulletone composed of the 10,000 most frequent words,bulletthe other composed of 200,000 words. The recognition results are given and compared. The error rate on words with 10,000 words is 17.3%. If the errors due to the lack of coverage are not counted, the error rate with 10,000 words is reduced to 10.6%. The error rate with 200,000 words is 12.7%.

本文提出了一种新的语音识别策略——多级解码(MLD)，该策略允许在语音识别中使用超大型字典(VLSD，大小超过100,000个单词)。MLD分三步进行:音节匹配过程使用声学模型来构建一个最可能的音节列表，这些音节与给定时间范围内的声学信号相匹配。从这个列表中，一个单词匹配过程使用字典来构建部分单词假设。然后，句子匹配过程使用概率语言模型建立部分句子假设，直到找到全部句子。对音节匹配过程提出了一种新颖的匹配算法。该策略在一篇法语课文的听写任务中进行了实验。我们测试了两本不同的词典，其中一本包含了1万个最常用的单词，另一本包含了20万个单词。给出了识别结果并进行了比较。1万个单词的错误率为17.3%。如果不计算由于缺乏覆盖而导致的错误，则10000字的错误率降至10.6%。20万字的错误率为12.7%。

引用次数: 20

Vector predictive quantization of the spectral parameters for low rate speech coding 低速率语音编码中频谱参数的矢量预测量化

ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing

Pub Date : 1987-04-06 DOI: 10.1109/ICASSP.1987.1169360

Y. Shoham

Vector Predictive Quantization (VPQ) is proposed for coding the short-term spectral envelope of speech. The proposed VPQ scheme predicts the current spectral envelope from several past spectra, using a predictor codebook. The residual spectrum is coded by a residual codebook. The system operates in the log-spectral domain using a sampled version of the spectral envelope. Experimental results indicate a prediction gain in the range of 9 to 13 dB and an average log-spectral distance of 1.3 to 1.7 dB. Informal listening tests suggest that replacing the conventional scalar quantizer in a 4.8 Kbits/s CELP coder by a VPQ system allows a reduction of the rate assigned to the LPC data from 1.8 Kbits/s to 1.0 Kbits/s without any obvious difference in the perceptual quality.

提出了一种基于矢量预测量化的语音短时频谱包络编码方法。提出的VPQ方案利用预测码本从多个过去的频谱预测当前的频谱包络线。残差谱由残差码本编码。该系统使用采样版本的频谱包络在对数频谱域中运行。实验结果表明，预测增益范围为9 ~ 13 dB，平均对数谱距离为1.3 ~ 1.7 dB。非正式听力测试表明，用VPQ系统取代4.8 Kbits/s CELP编码器中的传统标量量化器可以将分配给LPC数据的速率从1.8 Kbits/s降低到1.0 Kbits/s，而感知质量没有明显差异。

引用次数: 38

A frequency-weighted Itakura spectral distortion measure and its application to speech recognition in noise 频率加权Itakura频谱失真测量及其在噪声环境下语音识别中的应用

ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing

Pub Date : 1987-04-06 DOI: 10.1109/ICASSP.1987.1169899

F. Soong, M. Sondhi

The performance of a recognizer based on the Itakura spectral distortion measure deteriorates when speech signals are corrupted by noise, specially if it is not feasible to train and to test the recognizer under similar noise conditions. To alleviate this problem, we consider a more noise-resistant, weighted spectral distortion measure which weights the high SNR regions in frequency more than the low SNR regions. For the weighting function we choose a "bandwidth broadened" test spectrum; it weights spectral distortion more at the peaks than at the valleys of the spectrum. The amount of weighting is adapted according to an estimate of SNR, and becomes essentially constant in the noise-free case. The new measure has the dot product form and computaional efficiency of the Itakura distortion measure in the autocorrelation domain. It has been tested on a 10 speaker, isolated digit data base in a series of speaker independent speech recognition experiments. Additive white Gaussian noise was used to simulate different SNR conditions (from 5 dB to ∞ dB). The new measure performs as well as the original unweighted Itakura distortion measure at high SNR's, and significantly better at medium to low SNRs. At an SNR of 5 dB, the new measure achieves a digit error rate of 12.49% while the original Itakura distortion gives an error rate of 27.6%. The equivalent SNR improvement at low SNR's, is about 5 - 7 dB.

当语音信号被噪声破坏时，基于Itakura谱失真度量的识别器的性能会下降，特别是在类似噪声条件下无法训练和测试识别器时。为了缓解这一问题，我们考虑了一种更抗噪声的加权频谱失真测量方法，该方法在频率上对高信噪比区域的权重大于低信噪比区域。对于加权函数，我们选择“带宽加宽”的测试谱;它对光谱的波峰比波谷更重视光谱失真。加权量根据信噪比的估计进行调整，在无噪声情况下基本保持不变。新测度在自相关域具有Itakura失真测度的点积形式和计算效率。它已在10个说话人、孤立数字数据库中进行了一系列独立于说话人的语音识别实验。采用加性高斯白噪声模拟不同信噪比条件(从5 dB到∞dB)。新测量方法在高信噪比下的表现与原始的未加权Itakura失真测量方法一样好，在中低信噪比下表现明显更好。在信噪比为5 dB时，新方法的数字误差率为12.49%，而原始Itakura失真的误差率为27.6%。在低信噪比下，等效信噪比改善约为5 - 7db。

{"title":"A frequency-weighted Itakura spectral distortion measure and its application to speech recognition in noise","authors":"F. Soong, M. Sondhi","doi":"10.1109/ICASSP.1987.1169899","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169899","url":null,"abstract":"The performance of a recognizer based on the Itakura spectral distortion measure deteriorates when speech signals are corrupted by noise, specially if it is not feasible to train and to test the recognizer under similar noise conditions. To alleviate this problem, we consider a more noise-resistant, weighted spectral distortion measure which weights the high SNR regions in frequency more than the low SNR regions. For the weighting function we choose a \"bandwidth broadened\" test spectrum; it weights spectral distortion more at the peaks than at the valleys of the spectrum. The amount of weighting is adapted according to an estimate of SNR, and becomes essentially constant in the noise-free case. The new measure has the dot product form and computaional efficiency of the Itakura distortion measure in the autocorrelation domain. It has been tested on a 10 speaker, isolated digit data base in a series of speaker independent speech recognition experiments. Additive white Gaussian noise was used to simulate different SNR conditions (from 5 dB to ∞ dB). The new measure performs as well as the original unweighted Itakura distortion measure at high SNR's, and significantly better at medium to low SNRs. At an SNR of 5 dB, the new measure achieves a digit error rate of 12.49% while the original Itakura distortion gives an error rate of 27.6%. The equivalent SNR improvement at low SNR's, is about 5 - 7 dB.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116527103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 50

BYBLOS: The BBN continuous speech recognition system BBN连续语音识别系统

ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing

Pub Date : 1987-04-06 DOI: 10.1109/ICASSP.1987.1169748

Y. Chow, M. O. Dunham, O. Kimball, M. Krasner, G. Kubala, J. Makhoul, P. Price, Salim Roukos, R. Schwartz

In this paper, we describe BYBLOS, the BBN continuous speech recognition system. The system, designed for large vocabulary applications, integrates acoustic, phonetic, lexical, and linguistic knowledge sources to achieve high recognition performance. The basic approach, as described in previous papers [1, 2], makes extensive use of robust context-dependent models of phonetic coarticulation using Hidden Markov Models (HMM). We describe the components of the BYBLOS system, including: signal processing frontend, dictionary, phonetic model training system, word model generator, grammar and decoder. In recognition experiments, we demonstrate consistently high word recognition performance on continuous speech across: speakers, task domains, and grammars of varying complexity. In speaker-dependent mode, where 15 minutes of speech is required for training to a speaker, 98.5% word accuracy has been achieved in continuous speech for a 350-word task, using grammars with perplexity ranging from 30 to 60. With only 15 seconds of training speech we demonstrate performance of 97% using a grammar.

本文介绍了BBN连续语音识别系统BYBLOS。该系统专为大词汇量应用而设计，集成了声学、语音、词汇和语言知识来源，以实现高识别性能。如之前的论文[1,2]所述，基本方法广泛使用隐马尔可夫模型(HMM)的语音协同发音的鲁棒上下文相关模型。介绍了BYBLOS系统的组成部分，包括:信号处理前端、字典、语音模型训练系统、词模型生成器、语法和解码器。在识别实验中，我们在不同复杂程度的说话者、任务域和语法的连续语音中展示了一致的高单词识别性能。在说话人依赖模式下，训练一个说话人需要15分钟的讲话时间，在连续讲话中，使用困惑度从30到60的语法，在350个单词的任务中，单词准确率达到了98.5%。只需15秒的训练演讲，我们就能证明使用语法的表现达到97%。

{"title":"BYBLOS: The BBN continuous speech recognition system","authors":"Y. Chow, M. O. Dunham, O. Kimball, M. Krasner, G. Kubala, J. Makhoul, P. Price, Salim Roukos, R. Schwartz","doi":"10.1109/ICASSP.1987.1169748","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169748","url":null,"abstract":"In this paper, we describe BYBLOS, the BBN continuous speech recognition system. The system, designed for large vocabulary applications, integrates acoustic, phonetic, lexical, and linguistic knowledge sources to achieve high recognition performance. The basic approach, as described in previous papers [1, 2], makes extensive use of robust context-dependent models of phonetic coarticulation using Hidden Markov Models (HMM). We describe the components of the BYBLOS system, including: signal processing frontend, dictionary, phonetic model training system, word model generator, grammar and decoder. In recognition experiments, we demonstrate consistently high word recognition performance on continuous speech across: speakers, task domains, and grammars of varying complexity. In speaker-dependent mode, where 15 minutes of speech is required for training to a speaker, 98.5% word accuracy has been achieved in continuous speech for a 350-word task, using grammars with perplexity ranging from 30 to 60. With only 15 seconds of training speech we demonstrate performance of 97% using a grammar.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114854489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 176

Lemniscate transform: A new efficient technique for shape coding and representation Lemniscate变换:一种新的高效的形状编码和表示技术

ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing

Pub Date : 1987-04-06 DOI: 10.1109/ICASSP.1987.1169666

A. Kundu

In this paper, a new algorithm for visual shape coding and representation using a powerful theorem on algebraic curve has been presented. The algorithm codes the closed shape, which does never intersect itself, by means of the foci of the closed curve and a distance parameter 'p' such that the product of distances of any point on the shape from the foci is approximately constant and is equal to 'p'. The computation of the foci co-ordinates and 'p' parameter has been presented as a solution to a linearized least-square problem. The reconstruction algorithm is based on straightforward implementation of the theorem. Some experimental results have also been provided indicating the success of the algorithm.

本文利用代数曲线上的一个强大定理，提出了一种视觉形状编码和表示的新算法。该算法通过封闭曲线的焦点和距离参数p对不与自身相交的封闭形状进行编码，使得形状上任何一点到焦点的距离积近似为常数，等于p。焦点坐标和p参数的计算是线性化最小二乘问题的一种解决方法。重建算法基于定理的直接实现。实验结果表明，该算法是成功的。

引用次数: 0

A criticism of the parametric EEG spike detector 参数化脑电图尖峰检测器的批评

ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing

Pub Date : 1987-04-06 DOI: 10.1109/ICASSP.1987.1169635

M. Beddoes, L. Panych, Juan Qian, J. Wada

The role of the parametric stage is studied under various conditions and the following points have been demonstrated: -With high sampling rate (200 Hz) but otherwise favourable conditions, as the filter order, p, is increased from zero to nineteen the signal-to-noise ratio at the output of the parametric stage remains the same as at the input. Under less favourable conditions, it can fall as p is increased. -We find that comparable performance (in terms of spikes detected) is obtained when the parametric stage is omitted entirely, and detection is based only on the very simple non-parametric stage.

在各种条件下研究了参数级的作用，并证明了以下几点:-在高采样率(200 Hz)但其他有利条件下，当滤波器阶数p从0增加到19时，参数级输出的信噪比保持与输入相同。在不太有利的条件下，它可能随着p的增加而下降。我们发现，当完全省略参数阶段时，可以获得相当的性能(就检测到的峰值而言)，并且检测仅基于非常简单的非参数阶段。

引用次数: 2

Multi-style training for robust isolated-word speech recognition 鲁棒孤立词语音识别的多风格训练

ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing

Pub Date : 1987-04-06 DOI: 10.1109/ICASSP.1987.1169544

R. Lippmann, E. A. Martin, D. Paul

A new training procedure called multi-style training has been developed to improve performance when a recognizer is used under stress or in high noise but cannot be trained in these conditions. Instead of speaking normally during training, talkers use different, easily produced, talking styles. This technique was tested using a speech data base that included stress speech produced during a workload task and when intense noise was presented through earphones. A continuous-distribution talker-dependent Hidden Markov Model (HMM) recognizer was trained both normally (5 normally spoken tokens) and with multi-style training (one token each from normal, fast, clear, loud, and question-pitch talking styles). The average error rate under stress and normal conditions fell by more than a factor of two with multi-style training and the average error rate under conditions sampled during training fell by a factor of four.

当识别器在压力或高噪声条件下使用但不能在这些条件下进行训练时，一种新的称为多风格训练的训练程序已经被开发出来，以提高识别器的性能。在训练中，说话者使用不同的、容易产生的说话风格，而不是正常地说话。我们使用语音数据库测试了这一技术，其中包括在工作量任务中产生的压力语音，以及通过耳机呈现强烈噪音时产生的压力语音。一个连续分布的依赖于说话者的隐马尔可夫模型(HMM)识别器进行了正常训练(5个正常说话的标记)和多风格训练(正常、快速、清晰、大声和问题音调说话风格各一个标记)。采用多模式训练，压力和正常条件下的平均错误率下降了2倍以上，训练过程中采样条件下的平均错误率下降了4倍。

引用次数: 353

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀