2009 IEEE Workshop on Automatic Speech Recognition & Understanding最新文献

英文中文

Correlation-based query relaxation for example-based dialog modeling 基于关联的查询松弛，用于基于示例的对话建模

2009 IEEE Workshop on Automatic Speech Recognition & Understanding

Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373242

Cheongjae Lee, Sungjin Lee, Sangkeun Jung, Kyungduk Kim, Donghyeon Lee, G. G. Lee

Query relaxation refers to the process of reducing the number of constraints on a query if it returns no result when searching a database. This is an important process to enable extraction of an appropriate number of query results because queries that are too strictly constrained may return no result, whereas queries that are too loosely constrained may return too many results. This paper proposes an automated method of correlation-based query relaxation (CBQR) to select an appropriate constraint subset. The example-based dialog modeling framework was used to validate our algorithm. Preliminary results show that the proposed method facilitates the automation of query relaxation. We believe that the CBQR algorithm effectively relaxes constraints on failed queries to return more dialog examples.

查询松弛是指在搜索数据库时，如果查询没有返回结果，则减少查询上的约束数量的过程。这是一个重要的过程，可以提取适当数量的查询结果，因为过于严格约束的查询可能不返回任何结果，而过于松散约束的查询可能返回太多结果。提出了一种基于关联的查询松弛(CBQR)方法来选择合适的约束子集。使用基于示例的对话框建模框架来验证我们的算法。初步结果表明，该方法有利于查询松弛的自动化。我们认为CBQR算法有效地放宽了对失败查询的约束，以返回更多的对话示例。

引用次数: 13

Large-margin feature adaptation for automatic speech recognition 自动语音识别的大距特征自适应

2009 IEEE Workshop on Automatic Speech Recognition & Understanding

Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373320

Chih-Chieh Cheng, Fei Sha, L. Saul

We consider how to optimize the acoustic features used by hidden Markov models (HMMs) for automatic speech recognition (ASR). We investigate a mistake-driven algorithm that discriminatively reweights the acoustic features in order to separate the log-likelihoods of correct and incorrect transcriptions by a large margin. The algorithm simultaneously optimizes the HMM parameters in the back end by adapting them to the reweighted features computed by the front end. Using an online approach, we incrementally update feature weights and model parameters after the decoding of each training utterance. To mitigate the strongly biased gradients from individual training utterances, we train several different recognizers in parallel while tying the feature transformations in their front ends. We show that this parameter-tying across different recognizers leads to more stable updates and generally fewer recognition errors.

我们考虑了如何优化隐藏马尔可夫模型(hmm)用于自动语音识别(ASR)的声学特征。我们研究了一种错误驱动的算法，该算法区分地重新加权声学特征，以便在很大程度上分离正确和错误转录的对数可能性。该算法通过使HMM参数适应前端计算的重加权特征，同时对后端HMM参数进行优化。使用在线方法，我们在每个训练话语解码后增量更新特征权重和模型参数。为了减轻来自单个训练话语的强烈偏差梯度，我们并行训练了几个不同的识别器，同时将特征转换捆绑在它们的前端。我们表明，这种跨不同识别器的参数绑定导致更稳定的更新和更少的识别错误。

引用次数: 4

Integrating prosodic features in extractive meeting summarization 结合节选会议摘要的韵律特征

2009 IEEE Workshop on Automatic Speech Recognition & Understanding

Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373302

Shasha Xie, Dilek Z. Hakkani-Tür, Benoit Favre, Yang Liu

Speech contains additional information than text that can be valuable for automatic speech summarization. In this paper, we evaluate how to effectively use acoustic/prosodic features for extractive meeting summarization, and how to integrate prosodic features with lexical and structural information for further improvement. To properly represent prosodic features, we propose different normalization methods based on speaker, topic, or local context information. Our experimental results show that using only the prosodic features we achieve better performance than using the non-prosodic information on both the human transcripts and recognition output. In addition, a decision-level combination of the prosodic and non-prosodic features yields further gain, outperforming the individual models.

语音包含比文本更多的信息，这些信息对于自动语音摘要很有价值。在本文中，我们评估了如何有效地利用声学/韵律特征进行抽取会议摘要，以及如何将韵律特征与词汇和结构信息相结合以进一步改进。为了恰当地表示韵律特征，我们提出了基于说话人、主题或局部上下文信息的不同归一化方法。实验结果表明，仅使用韵律特征在人类文本和识别输出上都比使用非韵律信息取得了更好的性能。此外，韵律和非韵律特征的决策级组合产生了进一步的增益，优于单个模型。

引用次数: 56

Multilingual speaker age recognition: Regression analyses on the Lwazi corpus 多语说话者年龄识别:Lwazi语料库的回归分析

2009 IEEE Workshop on Automatic Speech Recognition & Understanding

Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373374

M. Feld, E. Barnard, C. V. Heerden, Christian A. Müller

Multilinguality represents an area of significant opportunities for automatic speech-processing systems: whereas multilingual societies are commonplace, the majority of speech-processing systems are developed with a single language in mind. As a step towards improved understanding of multilingual speech processing, the current contribution investigates how an important para-linguistic aspect of speech, namely speaker age, depends on the language spoken. In particular, we study how certain speech features affect the performance of an age recognition system for different South African languages in the Lwazi corpus. By optimizing our feature set and performing language-specific tuning, we are working towards true multilingual classifiers. As they are closely related, ASR and dialog systems are likely to benefit from an improved classification of the speaker. In a comprehensive corpus analysis on long-term features, we have identified features that exhibit characteristic behaviors for particular languages. In a follow-up regression experiment, we confirm the suitability of our feature selection for age recognition and present cross-language error rates. The mean absolute error ranges between 7.7 and 12.8 years for same-language predictors and rises to 14.5 years for cross-language predictors.

多语言代表了自动语音处理系统的重要机遇领域:尽管多语言社会很普遍，但大多数语音处理系统都是在考虑单一语言的情况下开发的。作为提高对多语言语音处理理解的一步，目前的贡献研究了语音的一个重要的准语言方面，即说话者的年龄如何取决于所讲的语言。特别地，我们研究了某些语音特征如何影响Lwazi语料库中不同南非语言的年龄识别系统的性能。通过优化我们的特性集和执行特定于语言的调优，我们正在朝着真正的多语言分类器努力。由于它们是密切相关的，ASR和对话系统可能会受益于对说话人的改进分类。在对长期特征的全面语料库分析中，我们已经确定了显示特定语言特征行为的特征。在后续的回归实验中，我们证实了我们的特征选择对年龄识别的适用性，并给出了跨语言错误率。同一语言预测器的平均绝对误差在7.7到12.8年之间，而跨语言预测器的平均绝对误差则上升到14.5年。

{"title":"Multilingual speaker age recognition: Regression analyses on the Lwazi corpus","authors":"M. Feld, E. Barnard, C. V. Heerden, Christian A. Müller","doi":"10.1109/ASRU.2009.5373374","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373374","url":null,"abstract":"Multilinguality represents an area of significant opportunities for automatic speech-processing systems: whereas multilingual societies are commonplace, the majority of speech-processing systems are developed with a single language in mind. As a step towards improved understanding of multilingual speech processing, the current contribution investigates how an important para-linguistic aspect of speech, namely speaker age, depends on the language spoken. In particular, we study how certain speech features affect the performance of an age recognition system for different South African languages in the Lwazi corpus. By optimizing our feature set and performing language-specific tuning, we are working towards true multilingual classifiers. As they are closely related, ASR and dialog systems are likely to benefit from an improved classification of the speaker. In a comprehensive corpus analysis on long-term features, we have identified features that exhibit characteristic behaviors for particular languages. In a follow-up regression experiment, we confirm the suitability of our feature selection for age recognition and present cross-language error rates. The mean absolute error ranges between 7.7 and 12.8 years for same-language predictors and rises to 14.5 years for cross-language predictors.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128285934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Leveraging speech production knowledge for improved speech recognition 利用语音生产知识来改进语音识别

2009 IEEE Workshop on Automatic Speech Recognition & Understanding

Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373368

A. Sangwan, J. Hansen

This study presents a novel phonological methodology for speech recognition based on phonological features (PFs) which leverages the relationship between speech phonology and phonetics. In particular, the proposed scheme estimates the likelihood of observing speech phonology given an associative lexicon. In this manner, the scheme is capable of choosing the most likely hypothesis (word candidate) among a group of competing alternative hypotheses. The framework employs the Maximum Entropy (ME) model to learn the relationship between phonetics and phonology. Subsequently, we extend the ME model to a ME-HMM (maximum entropy-hidden Markov model) which captures the speech production and linguistic relationship between phonology and words. The proposed ME-HMM model is applied to the task of re-processing N-best lists where an absolute WRA (word recognition rate) increase of 1.7%, 1.9% and 1% are reported for TIMIT, NTIMIT, and the SPINE (speech in noise) corpora (15.5% and 22.5% relative reduction in word error rate for TIMIT and NTIMIT).

本研究提出了一种基于语音特征(PFs)的语音识别方法，该方法利用语音音系和语音学之间的关系。特别地，所提出的方案估计了给定联想词汇观察语音音系的可能性。通过这种方式，该方案能够从一组相互竞争的备选假设中选择最可能的假设(候选词)。该框架采用最大熵模型来学习语音和音系之间的关系。随后，我们将ME模型扩展到ME- hmm(最大熵隐马尔可夫模型)，该模型捕获语音产生以及音系和单词之间的语言关系。将所提出的ME-HMM模型应用于重新处理n个最佳列表的任务，其中TIMIT、NTIMIT和SPINE(噪声语音)语料库的绝对WRA(词识别率)分别提高了1.7%、1.9%和1% (TIMIT和NTIMIT的词错误率相对降低了15.5%和22.5%)。

引用次数: 3

The exploration/exploitation trade-off in Reinforcement Learning for dialogue management 对话管理中强化学习的探索/利用权衡

2009 IEEE Workshop on Automatic Speech Recognition & Understanding

Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373260

S. Varges, G. Riccardi, S. Quarteroni, A. Ivanov

Conversational systems use deterministic rules that trigger actions such as requests for confirmation or clarification. More recently, Reinforcement Learning and (Partially Observable) Markov Decision Processes have been proposed for this task. In this paper, we investigate action selection strategies for dialogue management, in particular the exploration/exploitation trade-off and its impact on final reward (i.e. the session reward after optimization has ended) and lifetime reward (i.e. the overall reward accumulated over the learner's lifetime). We propose to use interleaved exploitation sessions as a learning methodology to assess the reward obtained from the current policy. The experiments show a statistically significant difference in final reward of exploitation-only sessions between a system that optimizes lifetime reward and one that maximizes the reward of the final policy.

会话系统使用确定性规则来触发诸如确认或澄清请求之类的操作。最近，强化学习和(部分可观察的)马尔可夫决策过程被提出用于这项任务。在本文中，我们研究了对话管理的行动选择策略，特别是探索/利用权衡及其对最终奖励(即优化结束后的会话奖励)和终身奖励(即学习者一生积累的总体奖励)的影响。我们建议使用交错开发会话作为一种学习方法来评估从当前政策中获得的奖励。实验表明，在优化终身奖励的系统和最大化最终策略奖励的系统之间，仅利用会话的最终奖励在统计上有显著差异。

引用次数: 4

From speech to letters - using a novel neural network architecture for grapheme based ASR 从语音到字母——使用一种新颖的神经网络架构进行基于字素的ASR

2009 IEEE Workshop on Automatic Speech Recognition & Understanding

Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373257

F. Eyben, M. Wöllmer, Björn Schuller, Alex Graves

Main-stream automatic speech recognition systems are based on modelling acoustic sub-word units such as phonemes. Phonemisation dictionaries and language model based decoding techniques are applied to transform the phoneme hypothesis into orthographic transcriptions. Direct modelling of graphemes as sub-word units using HMM has not been successful. We investigate a novel ASR approach using Bidirectional Long Short-Term Memory Recurrent Neural Networks and Connectionist Temporal Classification, which is capable of transcribing graphemes directly and yields results highly competitive with phoneme transcription. In design of such a grapheme based speech recognition system phonemisation dictionaries are no longer required. All that is needed is text transcribed on the sentence level, which greatly simplifies the training procedure. The novel approach is evaluated extensively on the Wall Street Journal 1 corpus.

主流的自动语音识别系统是基于声学子词单元(如音素)的建模。利用音素词典和基于语言模型的解码技术将音素假设转换成正字法转录。使用HMM将字素直接建模为子词单位并不成功。我们研究了一种使用双向长短期记忆递归神经网络和连接主义时间分类的新型ASR方法，该方法能够直接转录字素，并产生与音素转录高度竞争的结果。在设计这样一个基于字素的语音识别系统时，不再需要音素词典。所需要的只是在句子级别上转录文本，这大大简化了训练过程。这种新方法在《华尔街日报1》语料库上得到了广泛的评估。

引用次数: 51

Ontology-based grounding of Spoken Language Understanding 基于本体的口语理解基础

2009 IEEE Workshop on Automatic Speech Recognition & Understanding

Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373500

S. Quarteroni, Marco Dinarelli, G. Riccardi

Current Spoken Language Understanding models rely on either hand-written semantic grammars or flat attribute-value sequence labeling. In most cases, no relations between concepts are modeled, and both concepts and relations are domain-specific, making it difficult to expand or port the domain model. In contrast, we expand our previous work on a domain model based on an ontology where concepts follow the predicate-argument semantics and domain-independent classical relations are defined on such concepts. We conduct a thorough study on a spoken dialog corpus collected within a customer care problem-solving domain, and we evaluate the coverage and impact of the ontology for the interpretation, grounding and re-ranking of spoken language understanding interpretations.

当前的口语理解模型要么依赖于手写的语义语法，要么依赖于平面属性值序列标记。在大多数情况下，概念之间没有关系被建模，并且概念和关系都是特定于领域的，因此很难扩展或移植领域模型。相比之下,我们扩大我们的以前的工作在一个域模型遵循predicate-argument基于本体概念语义和特定领域的经典定义关系等概念。我们对在客户服务问题解决领域中收集的口语对话语料库进行了深入的研究，并评估了本体对口语理解解释的解释，基础和重新排序的覆盖范围和影响。

引用次数: 2

Towards integrated machine translation using structural alignment from syntax-augmented synchronous parsing 从语法增强同步解析到使用结构对齐的集成机器翻译

2009 IEEE Workshop on Automatic Speech Recognition & Understanding

Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5372892

Bing Xiang, Bowen Zhou, Martin Cmejrek

In current statistical machine translation, IBM model based word alignment is widely used as a starting point to build phrase-based machine translation systems. However, such alignment model is separated from the rest of machine translation pipeline and optimized independently. Furthermore, structural information is not taken into account in the alignment model, which sometimes leads to incorrect alignments. In this paper, we present a novel method to connect a re-alignment model with a translation model in an integrated framework. We conduct bilingual chart parsing based on syntax-augmented synchronous context-free grammar. A Viterbi derivation tree is generated for each sentence pair with multiple features employed in a log-linear model. A new word alignment is created under the structural constraint from the Viterbi tree. Extensive experiments are conducted in a Farsi-to-English translation task in conversational speech domain and also a German-to-English translation task in text domain. Systems trained on the new alignment provide significant higher BLEU scores compared to a state-of-the-art baseline.

在当前的统计机器翻译中，基于IBM模型的词对齐被广泛用作构建基于短语的机器翻译系统的起点。然而，这种对齐模型是与其他机器翻译管道分离并独立优化的。此外，在对齐模型中没有考虑结构信息，有时会导致不正确的对齐。在本文中，我们提出了一种在集成框架中连接重新对齐模型和翻译模型的新方法。我们基于语法增强的同步上下文无关语法进行双语图表解析。在对数线性模型中，对每个具有多个特征的句子对生成一个Viterbi衍生树。在Viterbi树的结构约束下创建一个新的单词对齐。在对话语音域的波斯语-英语翻译任务和文本域的德语-英语翻译任务中进行了大量的实验。与最先进的基线相比，在新校准上训练的系统提供了显着更高的BLEU分数。

{"title":"Towards integrated machine translation using structural alignment from syntax-augmented synchronous parsing","authors":"Bing Xiang, Bowen Zhou, Martin Cmejrek","doi":"10.1109/ASRU.2009.5372892","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5372892","url":null,"abstract":"In current statistical machine translation, IBM model based word alignment is widely used as a starting point to build phrase-based machine translation systems. However, such alignment model is separated from the rest of machine translation pipeline and optimized independently. Furthermore, structural information is not taken into account in the alignment model, which sometimes leads to incorrect alignments. In this paper, we present a novel method to connect a re-alignment model with a translation model in an integrated framework. We conduct bilingual chart parsing based on syntax-augmented synchronous context-free grammar. A Viterbi derivation tree is generated for each sentence pair with multiple features employed in a log-linear model. A new word alignment is created under the structural constraint from the Viterbi tree. Extensive experiments are conducted in a Farsi-to-English translation task in conversational speech domain and also a German-to-English translation task in text domain. Systems trained on the new alignment provide significant higher BLEU scores compared to a state-of-the-art baseline.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132745338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Response timing generation and response type selection for a spontaneous spoken dialog system 自发语音对话系统的响应时间生成和响应类型选择

2009 IEEE Workshop on Automatic Speech Recognition & Understanding

Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5372898

Ryota Nishimura, S. Nakagawa

If a dialog system can respond to a user as naturally as a human, the interaction will appear smoother. In this research, we aim to develop a dialog system that emulates human behavior in a chat-like dialog. The proposed system makes use of a decision tree to generate chat-like responses at the appropriate times. These responses include “aizuchi” (back-channel), “repetition”, “collaborative completion”, etc. The system also reacts robustly to the user's overlapping utterances (barge-in) and disfluencies. The subjective evaluation shows that there is a high degree of naturalness in the timing of ordinary responses, overlap, and aizuchi, and that the dialog system exhibits user-friendly behavior. The recorded voices system was preferred, and almost all subjects felt familiarity with aizuchi, and the barge-in was also useful.

如果对话系统能够像人类一样自然地响应用户，那么交互就会显得更加流畅。在这项研究中，我们的目标是开发一个在类似聊天的对话中模拟人类行为的对话系统。所建议的系统利用决策树在适当的时间生成类似聊天的响应。这些反应包括“aizuchi”(反向渠道)、“重复”、“协作完成”等。该系统还对用户的重叠话语(闯入)和不流畅做出强有力的反应。主观评价表明，在普通响应的时间、重叠和合住方面具有高度的自然性，并且对话系统表现出用户友好的行为。录制的声音系统是首选，几乎所有的受试者都对aizuchi感到熟悉，并且驳船也很有用。

引用次数: 5

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2009 IEEE Workshop on Automatic Speech Recognition & Understanding

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀