IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.最新文献

英文中文

Example-based query generation for spontaneous speech 基于示例的自发语音查询生成

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.

Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034639

H. Murao, Nobuo Kawaguchi, S. Matsubara, Y. Inagaki

This paper proposes a new query generation method that is based on examples of human-to-human dialogue. Along with modeling the information flow in dialogue, a system for information retrieval in-car has been designed. The system refers to the dialogue corpus to find an example that is similar to input speech, and makes a query from the example. We also give the experimental results to show the effectiveness of this method.

本文提出了一种基于人对人对话实例的查询生成方法。在对对话信息流建模的基础上，设计了车载信息检索系统。系统参考对话语料库查找与输入语音相似的示例，并从示例中进行查询。实验结果表明了该方法的有效性。

引用次数: 10

Error analysis using decision trees in spontaneous presentation speech recognition 基于决策树的自发呈现语音识别错误分析

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.

Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034621

T. Shinozaki, S. Furui

This paper proposes the use of decision trees for analyzing errors in spontaneous presentation speech recognition. The trees are designed to predict whether a word or a phoneme can be correctly recognized or not, using word or phoneme attributes as inputs. The trees, are constructed using training "cases" by choosing questions about attributes step by step according to the gain ratio criterion. The errors in recognizing spontaneous presentations given by 10 male speakers were analyzed, and the explanation capability of attributes for the recognition errors was quantitatively evaluated. A restricted set of attributes closely related to the recognition errors was identified for both words and phonemes.

本文提出了一种基于决策树的语音识别错误分析方法。这些树被设计用来预测一个单词或音素是否可以被正确识别，使用单词或音素属性作为输入。根据增益比准则逐步选择有关属性的问题，使用训练“案例”构建树。分析了10名男性演讲者即兴演讲的识别错误，定量评价了识别错误的属性解释能力。一组与识别错误密切相关的有限属性被识别为单词和音素。

引用次数: 27

Beyond the Informedia digital video library: video and audio analysis for remembering conversations 超越信息媒体数字视频库:视频和音频分析记忆对话

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.

Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034646

Alexander Hauptmann, Wei-Hao Lin

The Informedia Project digital video library pioneered the automatic analysis of television broadcast news and its retrieval on demand. Building on that system, we have developed a wearable, personalized Informedia system, which listens to and transcribes the wearer's part of a conversation, recognizes the face of the current dialog partner and remembers his/her voice. The next time the system sees the same person's face and hears the same voice, it can retrieve the audio from the last conversation, replaying in compressed form the names and major issues that were mentioned. All of this happens unobtrusively, somewhat like an intelligent assistant who whispers to you: "That's Bob Jones from Tech Solutions; two weeks ago in London you discussed solar panels". This paper outlines the general system components as well as interface considerations. Initial implementations showed that both face recognition methods and speaker identification technology have serious shortfalls that must be overcome.

“信息媒体计划”数字视频库开创了电视广播新闻自动分析和按需检索的先河。在这个系统的基础上，我们开发了一种可穿戴的、个性化的信息媒体系统，它可以监听和记录佩戴者的对话，识别当前对话伙伴的脸，并记住他/她的声音。当系统下次看到同一个人的脸并听到同样的声音时，它可以从上次对话中检索音频，以压缩的形式重播被提到的名字和主要问题。所有这一切都发生在不显眼的地方，有点像一个智能助手低声对你说:“这是科技解决方案公司的鲍勃·琼斯;两周前在伦敦你们讨论了太阳能电池板。”本文概述了一般的系统组件以及接口注意事项。初步实现表明，人脸识别方法和说话人识别技术都有严重的不足，必须克服。

引用次数: 5

Robust and efficient confidence measure for isolated command recognition 隔离命令识别的鲁棒高效置信度方法

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.

Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034681

G. Hernández-Abrego, X. Menéndez-Pidal, L. Olorenshaw

A new confidence measure for isolated command recognition is presented. It is versatile and efficient in two ways. First, it is based exclusively on the speech recognizer's output. In addition, it is robust to changes in the vocabulary, acoustic model and parameter settings. Its calculation is very simple and it is based on the computation of a pseudo-filler score from an N-best list. Performance is tested in two different command recognition applications. Finally, it is efficient to separate correct results both from incorrect ones and from false alarms caused by out-of-vocabulary elements and noise.

提出了一种新的孤立命令识别置信度测度。它在两个方面是通用和高效的。首先，它完全基于语音识别器的输出。此外，它对词汇、声学模型和参数设置的变化具有鲁棒性。它的计算非常简单，它基于从n个最佳列表中计算一个伪填充分数。性能测试在两个不同的命令识别应用程序。最后，将正确的结果与不正确的结果以及由词汇外元素和噪声引起的假警报分开是有效的。

引用次数: 4

Comparison of standard and hybrid modeling techniques for distributed speech recognition 分布式语音识别的标准和混合建模技术比较

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.

Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034608

J. Stadermann, G. Rigoll

Distributed speech recognition (DSR) is an interesting technology for mobile recognition tasks where the recognizer is split up into two parts and connected by a transmission channel. We compare the performance of standard and hybrid modeling approaches in this environment. The evaluation is done on clean and noisy speech samples taken from the TI digits and the Aurora databases. Our results show that, for this task, the hybrid modeling techniques can outperform standard continuous systems.

分布式语音识别(DSR)是一种用于移动识别任务的有趣技术，它将识别器分成两个部分，并通过传输通道连接起来。我们比较了标准和混合建模方法在这种环境下的性能。评估是在从TI数字和Aurora数据库中获取的干净和有噪声的语音样本上完成的。我们的研究结果表明，对于这项任务，混合建模技术可以优于标准连续系统。

引用次数: 3

Introduction of speech interface for mobile information services 移动信息服务语音接口介绍

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.

Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034684

H. Nakano

Popular Japanese mobile Web-phones are widely used to connect to Internet providers (IP). The most popular service on mobile Web-phones is E-mail. Currently, users type the messages using the ten standard keys on the phone. Several letters and Kana (Japanese phonetic characters) are assigned to each key, and the user steps through them by tapping the key repeatedly. After inputting several words, the user converts them into Kanji (Chinese character). Kana-Kanji conversion is still improving, and recently fast text input methods have been introduced, but these key input methods are still troublesome. A speech interface is expected to overcome this input difficulty. However, speech interfaces suffer several problems, both technical and social. The paper summarises these problems and looks at some methods by which technical solutions may be found.

流行的日本移动网络电话被广泛用于连接到互联网提供商(IP)。移动网络电话上最流行的服务是电子邮件。目前，用户使用手机上的十个标准按键来输入信息。每个键都有几个字母和假名(日语的音标)，用户通过反复敲击这些键来完成操作。用户输入几个单词后，将其转换为汉字。假名和汉字的转换仍在改进中，最近还引入了快速文本输入法，但这些关键输入法仍然很麻烦。语音界面有望克服这一输入困难。然而，语音接口在技术上和社会上都存在一些问题。本文总结了这些问题，并探讨了找到技术解决方案的一些方法。

引用次数: 0

A comparative study of model-based adaptation techniques for a compact speech recognizer 紧凑型语音识别器基于模型的自适应技术比较研究

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.

Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034581

F. Thiele, R. Bippus

Many techniques for speaker adaptation have been successfully applied to automatic speech recognition. This paper compares the performance of several adaptation methods with respect to their memory need and processing demand. For adaptation of a compact acoustic model with 4k densities, eigenvoices and structural MAP (SMAP) are investigated next to the well-known techniques of MAP (maximum a posteriori) and MLLR (maximum likelihood linear regression) adaptation. Experimental results are reported for unsupervised on-line adaptation on different amounts of adaptation data ranging from 4 to 500 words per speaker. The results show that for small amounts of adaptation data it might be more efficient to employ a larger baseline acoustic model without adaptation. Eigenvoices achieve the lowest word error rates of all adaptation techniques but SMAP presents a good compromise between memory requirement and accuracy.

许多说话人自适应技术已经成功地应用于语音自动识别中。本文从记忆需求和处理需求两方面比较了几种自适应方法的性能。为了适应具有4k密度的紧凑声学模型，除了众所周知的MAP(最大后验)和MLLR(最大似然线性回归)自适应技术外，还研究了特征声和结构MAP (SMAP)。本文报道了在每个说话者4 ~ 500个单词的不同适应数据量上的无监督在线适应实验结果。结果表明，对于少量的适应数据，采用较大的基线声学模型而不进行适应可能会更有效。在所有自适应技术中，特征语音的错误率最低，而SMAP在记忆要求和准确性之间取得了很好的折衷。

引用次数: 2

Unsupervised training of acoustic models for large vocabulary continuous speech recognition 大词汇量连续语音识别声学模型的无监督训练

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.

Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034648

F. Wessel, H. Ney

For speech recognition systems, the amount of acoustic training data is of crucial importance. In the past, large amounts of speech were recorded and transcribed manually for training. Since untranscribed speech is available in various forms these days, the unsupervised training of a speech recognizer on recognized transcriptions is studied. A low-cost recognizer trained with only one hour of manually transcribed speech is used to recognize 72 hours of untranscribed acoustic data. These transcriptions are then used in combination with confidence measures to train an improved recognizer. The effect of confidence measures which are used to detect possible recognition errors is studied systematically. Finally, the unsupervised training is applied iteratively. Using this method, the recognizer is trained with very little manual effort while losing only 14.3% relative on the Broadcast News '96 and 18.6% relative on the Broadcast News '98 evaluation test sets.

对于语音识别系统来说，声学训练数据的数量是至关重要的。过去，为了训练，大量的语音都是手工录制和转录的。由于目前存在多种形式的未转录语音，因此研究了语音识别器对已识别语音的无监督训练。一个只经过一小时人工转录语音训练的低成本识别器被用来识别72小时未转录的声学数据。然后将这些转录与置信度措施结合使用来训练改进的识别器。系统地研究了用于检测可能的识别错误的置信度度量的效果。最后，迭代地应用无监督训练。使用这种方法，识别器只需要很少的人工训练，而在Broadcast News '96和Broadcast News '98评估测试集上的相对损失仅为14.3%和18.6%。

引用次数: 20

Joint estimation of noise and channel distortion in a generalized EM framework 广义电磁框架下噪声和信道失真的联合估计

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.

Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034611

T. Krisjansson, B. Frey, L. Deng, A. Acero

The performance of speech cleaning and noise adaptation algorithms is heavily dependent on the quality of the noise and channel models. Various strategies have been proposed in the literature for adapting to the current noise and channel conditions. We describe the joint learning of noise and channel distortion in a novel framework called ALGONQUIN. The learning algorithm employs a generalized EM strategy wherein the E step is approximate. We discuss the characteristics of the new algorithm, with a focus on convergence rates and parameter initialization. We show that the learning algorithm can successfully disentangle the non-linear effects of noise and linear effects of the channel and achieve a relative reduction in WER of 21.8% over the non-adaptive algorithm.

语音清理和噪声自适应算法的性能很大程度上取决于噪声和信道模型的质量。为了适应当前的噪声和信道条件，文献中提出了各种策略。我们在一个叫做ALGONQUIN的新框架中描述了噪声和信道失真的联合学习。学习算法采用广义的EM策略，其中E步是近似的。我们讨论了新算法的特点，重点是收敛速度和参数初始化。我们表明，学习算法可以成功地分离噪声的非线性影响和信道的线性影响，并且与非自适应算法相比，相对降低了21.8%的WER。

引用次数: 22

Improvement of non-negative matrix factorization based language model using exponential models 基于指数模型的非负矩阵分解语言模型的改进

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.

Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034619

M. Novak, R. Mammone

This paper describes the use of exponential models to improve non-negative matrix factorization (NMF) based topic language models for automatic speech recognition. This modeling technique borrows the basic idea from latent semantic analysis (LSA), which is typically used in information retrieval. An improvement was achieved when exponential models were used to estimate the a posteriori topic probabilities for an observed history. This method improved the perplexity of the NMF model, resulting in a 24% perplexity improvement overall when compared to a trigram language model.

本文描述了使用指数模型来改进基于非负矩阵分解(NMF)的主题语言模型，用于自动语音识别。这种建模技术借鉴了潜在语义分析(LSA)的基本思想，这种方法通常用于信息检索。当使用指数模型来估计观察历史的后验主题概率时，取得了改进。该方法改善了NMF模型的困惑度，与三元语言模型相比，总体上困惑度提高了24%。

引用次数: 7

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀