首页 > 最新文献

2011 IEEE Workshop on Automatic Speech Recognition & Understanding最新文献

英文 中文
Unsupervised learning in cross-corpus acoustic emotion recognition 跨语料库声学情感识别中的无监督学习
Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163986
Zixing Zhang, F. Weninger, M. Wöllmer, Björn Schuller
One of the ever-present bottlenecks in Automatic Emotion Recognition is data sparseness. We therefore investigate the suitability of unsupervised learning in cross-corpus acoustic emotion recognition through a large-scale study with six commonly used databases, including acted and natural emotion speech, and covering a variety of application scenarios and acoustic conditions. We show that adding unlabeled emotional speech to agglomerated multi-corpus training sets can enhance recognition performance even in a challenging cross-corpus setting; furthermore, we show that the expected gain by adding unlabeled data on average is approximately half the one achieved by additional manually labeled data in leave-one-corpus-out validation.
数据稀疏性是自动情绪识别的瓶颈之一。因此,我们通过大规模研究无监督学习在跨语料库声学情感识别中的适用性,包括六个常用的数据库,包括动作和自然情感语音,涵盖各种应用场景和声学条件。我们表明,即使在具有挑战性的跨语料库设置中,将未标记的情感语音添加到聚集的多语料库训练集也可以提高识别性能;此外,我们表明,通过添加未标记数据获得的预期增益平均大约是在leave-one-corpus- outs验证中添加手动标记数据获得的预期增益的一半。
{"title":"Unsupervised learning in cross-corpus acoustic emotion recognition","authors":"Zixing Zhang, F. Weninger, M. Wöllmer, Björn Schuller","doi":"10.1109/ASRU.2011.6163986","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163986","url":null,"abstract":"One of the ever-present bottlenecks in Automatic Emotion Recognition is data sparseness. We therefore investigate the suitability of unsupervised learning in cross-corpus acoustic emotion recognition through a large-scale study with six commonly used databases, including acted and natural emotion speech, and covering a variety of application scenarios and acoustic conditions. We show that adding unlabeled emotional speech to agglomerated multi-corpus training sets can enhance recognition performance even in a challenging cross-corpus setting; furthermore, we show that the expected gain by adding unlabeled data on average is approximately half the one achieved by additional manually labeled data in leave-one-corpus-out validation.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129858631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 111
Alignment of spoken narratives for automated neuropsychological assessment 自动神经心理学评估的口头叙述对齐
Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163979
Emily Tucker Prud'hommeaux, Brian Roark
Narrative recall tasks are commonly included in neurological examinations, as deficits in narrative memory are associated with disorders such as Alzheimer's dementia. We explore methods for automatically scoring narrative retellings via alignment to a source narrative. Standard alignment methods, designed for large bilingual corpora for machine translation, yield high alignment error rates (AER) on our small monolingual corpora. We present modifications to these methods that obtain a decrease in AER, an increase in scoring accuracy, and diagnostic classification performance comparable to that of manual methods, thus demonstrating the utility of these techniques for this task and other tasks relying on monolingual alignments.
叙述性回忆任务通常包括在神经学检查中,因为叙述性记忆缺陷与阿尔茨海默氏痴呆症等疾病有关。我们探索了通过与源叙事对齐来自动为叙事复述打分的方法。为机器翻译大型双语语料库而设计的标准对齐方法在小型单语语料库上产生高对齐错误率(AER)。我们对这些方法进行了修改,以获得与手动方法相当的AER降低,评分准确性提高和诊断分类性能,从而证明了这些技术在此任务和其他依赖单语对齐的任务中的实用性。
{"title":"Alignment of spoken narratives for automated neuropsychological assessment","authors":"Emily Tucker Prud'hommeaux, Brian Roark","doi":"10.1109/ASRU.2011.6163979","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163979","url":null,"abstract":"Narrative recall tasks are commonly included in neurological examinations, as deficits in narrative memory are associated with disorders such as Alzheimer's dementia. We explore methods for automatically scoring narrative retellings via alignment to a source narrative. Standard alignment methods, designed for large bilingual corpora for machine translation, yield high alignment error rates (AER) on our small monolingual corpora. We present modifications to these methods that obtain a decrease in AER, an increase in scoring accuracy, and diagnostic classification performance comparable to that of manual methods, thus demonstrating the utility of these techniques for this task and other tasks relying on monolingual alignments.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129869419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Minimum Bayes risk discriminative language models for Arabic speech recognition 阿拉伯语语音识别的最小贝叶斯风险判别语言模型
Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163932
H. Kuo, E. Arisoy, L. Mangu, G. Saon
In this paper we explore discriminative language modeling (DLM) on highly optimized state-of-the-art large vocabulary Arabic broadcast speech recognition systems used for the Phase 5 DARPA GALE Evaluation. In particular, we study in detail a minimum Bayes risk (MBR) criterion for DLM. MBR training outperforms perceptron training. Interestingly, we found that our DLMs generalized to mismatched conditions, such as using a different acoustic model during testing. We also examine the interesting problem of unsupervised DLM training using a Bayes risk metric as a surrogate for word error rate (WER). In some experiments, we were able to obtain about half of the gain of the supervised DLM.
在本文中,我们探索了判别语言建模(DLM)在高度优化的最先进的大词汇阿拉伯广播语音识别系统中用于DARPA GALE评估的第5阶段。特别地,我们详细研究了DLM的最小贝叶斯风险(MBR)准则。MBR训练优于感知器训练。有趣的是,我们发现我们的dlm可以推广到不匹配的条件,例如在测试期间使用不同的声学模型。我们还研究了无监督DLM训练的有趣问题,使用贝叶斯风险度量作为单词错误率(WER)的替代。在一些实验中,我们能够获得大约一半的监督DLM增益。
{"title":"Minimum Bayes risk discriminative language models for Arabic speech recognition","authors":"H. Kuo, E. Arisoy, L. Mangu, G. Saon","doi":"10.1109/ASRU.2011.6163932","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163932","url":null,"abstract":"In this paper we explore discriminative language modeling (DLM) on highly optimized state-of-the-art large vocabulary Arabic broadcast speech recognition systems used for the Phase 5 DARPA GALE Evaluation. In particular, we study in detail a minimum Bayes risk (MBR) criterion for DLM. MBR training outperforms perceptron training. Interestingly, we found that our DLMs generalized to mismatched conditions, such as using a different acoustic model during testing. We also examine the interesting problem of unsupervised DLM training using a Bayes risk metric as a surrogate for word error rate (WER). In some experiments, we were able to obtain about half of the gain of the supervised DLM.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131212292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Subword-based multi-span pronunciation adaptation for recognizing accented speech 基于子词的多跨距语音自适应识别重音语音
Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163940
Timo Mertens, Kit Thambiratnam, F. Seide
We investigate automatic pronunciation adaptation for non-native accented speech by using statistical models trained on multi-span lingustic parse tables to generate candidate mispronunciations for a target language. Compared to traditional phone re-writing rules, parse table modeling captures more context in the form of phone-clusters or syllables, and encodes abstract features such as word-internal position or syllable structure. The proposed approach is attractive because it gives a unified method for combining multiple levels of linguistic information. The reported experiments demonstrate word error rate reductions of up to 7.9% and 3.3% absolute on Italian and German accented English using lexicon adaptation alone, and 12.4% and 11.3% absolute when combined with acoustic adaptation.
我们通过使用在多跨语言解析表上训练的统计模型来研究非母语口音语音的自动发音适应,以生成目标语言的候选错误发音。与传统的电话重写规则相比,解析表建模以电话簇或音节的形式捕获更多上下文,并对单词内部位置或音节结构等抽象特征进行编码。该方法具有一定的吸引力,因为它提供了一种统一的方法来组合多层次的语言信息。实验表明,仅使用词汇适应,意大利语和德语口音英语的单词错误率分别降低了7.9%和3.3%,结合声学适应,错误率分别降低了12.4%和11.3%。
{"title":"Subword-based multi-span pronunciation adaptation for recognizing accented speech","authors":"Timo Mertens, Kit Thambiratnam, F. Seide","doi":"10.1109/ASRU.2011.6163940","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163940","url":null,"abstract":"We investigate automatic pronunciation adaptation for non-native accented speech by using statistical models trained on multi-span lingustic parse tables to generate candidate mispronunciations for a target language. Compared to traditional phone re-writing rules, parse table modeling captures more context in the form of phone-clusters or syllables, and encodes abstract features such as word-internal position or syllable structure. The proposed approach is attractive because it gives a unified method for combining multiple levels of linguistic information. The reported experiments demonstrate word error rate reductions of up to 7.9% and 3.3% absolute on Italian and German accented English using lexicon adaptation alone, and 12.4% and 11.3% absolute when combined with acoustic adaptation.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131708679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Building a conversational model from two-tweets 从两条推文构建会话模型
Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163953
Ryuichiro Higashinaka, N. Kawamae, Kugatsu Sadamitsu, Yasuhiro Minami, Toyomi Meguro, Kohji Dohsaka, H. Inagaki
The current problem in building a conversational model from Twitter data is the scarcity of long conversations. According to our statistics, more than 90% of conversations in Twitter are composed of just two tweets. Previous work has utilized only conversations lasting longer than three tweets for dialogue modeling so that more than a single interaction can be successfully modeled. This paper verifies, by experiment, that two-tweet exchanges alone can lead to conversational models that are comparable to those made from longer-tweet conversations. This finding leverages the value of Twitter as a dialogue corpus and opens the possibility of better conversational modeling using Twitter data.
从Twitter数据构建会话模型的当前问题是长对话的稀缺性。根据我们的统计,推特上90%以上的对话都是由两条推文组成的。以前的工作只利用持续时间超过三条推文的对话进行对话建模,因此可以成功建模多个交互。本文通过实验验证,单独的两条tweet交换可以产生与长tweet对话相当的会话模型。这一发现充分利用了Twitter作为对话语料库的价值,并开启了使用Twitter数据进行更好的对话建模的可能性。
{"title":"Building a conversational model from two-tweets","authors":"Ryuichiro Higashinaka, N. Kawamae, Kugatsu Sadamitsu, Yasuhiro Minami, Toyomi Meguro, Kohji Dohsaka, H. Inagaki","doi":"10.1109/ASRU.2011.6163953","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163953","url":null,"abstract":"The current problem in building a conversational model from Twitter data is the scarcity of long conversations. According to our statistics, more than 90% of conversations in Twitter are composed of just two tweets. Previous work has utilized only conversations lasting longer than three tweets for dialogue modeling so that more than a single interaction can be successfully modeled. This paper verifies, by experiment, that two-tweet exchanges alone can lead to conversational models that are comparable to those made from longer-tweet conversations. This finding leverages the value of Twitter as a dialogue corpus and opens the possibility of better conversational modeling using Twitter data.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123143684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
An investigation of heuristic, manual and statistical pronunciation derivation for Pashto 普什图语启发式、人工和统计读音衍生研究
Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163939
U. Chaudhari, Xiaodong Cui, Bowen Zhou, Rong Zhang
In this paper, we study the issue of generating pronunciations for training and decoding with an ASR system for Pashto in the context of a Speech to Speech Translation system developed for TRANSTAC. As with other low resourced languages, a limited amount of acoustic training data was available with a corresponding set of manually produced vowelized pronunciations. We augment this data with other sources, but lack pronunciations for unseen words in the new audio and associated text. Four methods are investigated for generating these pronunciations, or baseforms: an heuristic grapheme to phoneme map, manual annotation, and two methods based on statistical models. The first of these uses a joint Maximum Entropy N-gram model while the other is based on a log-linear Statistical Machine Translation model. We report results on a state of the art, discriminatively trained, ASR system and show that the manual and statistical methods provide an improvement over the grapheme to phoneme map. Moreover, we demonstrate that the automatic statistical methods can perform as well or better than manual generation by native speakers, even in the case where we have a significant number of high quality, manually generated pronunciations beyond those provided by the TRANSTAC program.
本文在为TRANSTAC开发的语音到语音翻译系统的背景下,研究了普什图语ASR系统中用于训练和解码的发音生成问题。与其他资源匮乏的语言一样,有限数量的声学训练数据可用于相应的一组人工生成的元音发音。我们用其他来源增加了这些数据,但缺乏新音频和相关文本中未见单词的发音。本文研究了生成这些发音或基形的四种方法:启发式的字素到音素映射、手动注释和基于统计模型的两种方法。其中第一个使用联合最大熵N-gram模型,而另一个基于对数线性统计机器翻译模型。我们报告了一个最先进的、有区别的训练的ASR系统的结果,并表明手工和统计方法提供了比字素到音素映射的改进。此外,我们证明了自动统计方法可以表现得与母语人士手动生成的发音一样好,甚至更好,即使在我们有大量高质量的情况下,手动生成的发音超出了TRANSTAC程序提供的发音。
{"title":"An investigation of heuristic, manual and statistical pronunciation derivation for Pashto","authors":"U. Chaudhari, Xiaodong Cui, Bowen Zhou, Rong Zhang","doi":"10.1109/ASRU.2011.6163939","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163939","url":null,"abstract":"In this paper, we study the issue of generating pronunciations for training and decoding with an ASR system for Pashto in the context of a Speech to Speech Translation system developed for TRANSTAC. As with other low resourced languages, a limited amount of acoustic training data was available with a corresponding set of manually produced vowelized pronunciations. We augment this data with other sources, but lack pronunciations for unseen words in the new audio and associated text. Four methods are investigated for generating these pronunciations, or baseforms: an heuristic grapheme to phoneme map, manual annotation, and two methods based on statistical models. The first of these uses a joint Maximum Entropy N-gram model while the other is based on a log-linear Statistical Machine Translation model. We report results on a state of the art, discriminatively trained, ASR system and show that the manual and statistical methods provide an improvement over the grapheme to phoneme map. Moreover, we demonstrate that the automatic statistical methods can perform as well or better than manual generation by native speakers, even in the case where we have a significant number of high quality, manually generated pronunciations beyond those provided by the TRANSTAC program.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123147261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel bottleneck-BLSTM front-end for feature-level context modeling in conversational speech recognition 会话语音识别中特征级上下文建模的一种新型瓶颈- blstm前端
Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163902
M. Wöllmer, Björn Schuller, G. Rigoll
We present a novel automatic speech recognition (ASR) front-end that unites Long Short-Term Memory context modeling, bidirectional speech processing, and bottleneck (BN) networks for enhanced Tandem speech feature generation. Bidirectional Long Short-Term Memory (BLSTM) networks were shown to be well suited for phoneme recognition and probabilistic feature extraction since they efficiently incorporate a flexible amount of long-range temporal context, leading to better ASR results than conventional recurrent networks or multi-layer perceptrons. Combining BLSTM modeling and bottleneck feature generation allows us to produce feature vectors of arbitrary size, independent of the network training targets. Experiments on the COSINE and the Buckeye corpora containing spontaneous, conversational speech show that the proposed BN-BLSTM front-end leads to better ASR accuracies than previously proposed BLSTM-based Tandem and multi-stream systems.
我们提出了一种新的自动语音识别(ASR)前端,它结合了长短期记忆上下文建模,双向语音处理和瓶颈(BN)网络,用于增强串联语音特征生成。双向长短期记忆(BLSTM)网络被证明非常适合音素识别和概率特征提取,因为它们有效地结合了大量灵活的长时间上下文,导致比传统的循环网络或多层感知器更好的ASR结果。结合BLSTM建模和瓶颈特征生成,我们可以产生任意大小的特征向量,独立于网络训练目标。在包含自发对话语音的COSINE和Buckeye语料库上进行的实验表明,所提出的BN-BLSTM前端比先前提出的基于blstm的串联和多流系统具有更好的ASR精度。
{"title":"A novel bottleneck-BLSTM front-end for feature-level context modeling in conversational speech recognition","authors":"M. Wöllmer, Björn Schuller, G. Rigoll","doi":"10.1109/ASRU.2011.6163902","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163902","url":null,"abstract":"We present a novel automatic speech recognition (ASR) front-end that unites Long Short-Term Memory context modeling, bidirectional speech processing, and bottleneck (BN) networks for enhanced Tandem speech feature generation. Bidirectional Long Short-Term Memory (BLSTM) networks were shown to be well suited for phoneme recognition and probabilistic feature extraction since they efficiently incorporate a flexible amount of long-range temporal context, leading to better ASR results than conventional recurrent networks or multi-layer perceptrons. Combining BLSTM modeling and bottleneck feature generation allows us to produce feature vectors of arbitrary size, independent of the network training targets. Experiments on the COSINE and the Buckeye corpora containing spontaneous, conversational speech show that the proposed BN-BLSTM front-end leads to better ASR accuracies than previously proposed BLSTM-based Tandem and multi-stream systems.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121850096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
A variational perspective on noise-robust speech recognition 噪声鲁棒语音识别的变分视角
Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163917
R. V. Dalen, M. Gales
Model compensation methods for noise-robust speech recognition have shown good performance. Predictive linear transformations can approximate these methods to balance computational complexity and compensation accuracy. This paper examines both of these approaches from a variational perspective. Using a matched-pair approximation at the component level yields a number of standard forms of model compensation and predictive linear transformations. However, a tighter bound can be obtained by using variational approximations at the state level. Both model-based and predictive linear transform schemes can be implemented in this framework. Preliminary results show that the tighter bound obtained from the state-level variational approach can yield improved performance over standard schemes.
模型补偿方法在抗噪声语音识别中表现出良好的性能。预测线性变换可以近似这些方法,以平衡计算复杂度和补偿精度。本文从变分的角度考察了这两种方法。在组件级别使用匹配对近似可以产生许多标准形式的模型补偿和预测线性变换。然而,可以通过在状态水平上使用变分近似来获得更严格的界。基于模型和预测的线性变换方案都可以在该框架中实现。初步结果表明,由状态变分方法得到的更严格的约束比标准方案具有更好的性能。
{"title":"A variational perspective on noise-robust speech recognition","authors":"R. V. Dalen, M. Gales","doi":"10.1109/ASRU.2011.6163917","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163917","url":null,"abstract":"Model compensation methods for noise-robust speech recognition have shown good performance. Predictive linear transformations can approximate these methods to balance computational complexity and compensation accuracy. This paper examines both of these approaches from a variational perspective. Using a matched-pair approximation at the component level yields a number of standard forms of model compensation and predictive linear transformations. However, a tighter bound can be obtained by using variational approximations at the state level. Both model-based and predictive linear transform schemes can be implemented in this framework. Preliminary results show that the tighter bound obtained from the state-level variational approach can yield improved performance over standard schemes.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123994689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Investigating the role of machine translated text in ASR domain adaptation: Unsupervised and semi-supervised methods 研究机器翻译文本在ASR领域适应中的作用:无监督和半监督方法
Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163941
H. Cucu, L. Besacier, C. Burileanu, Andi Buzo
This study investigates the use of machine translated text for ASR domain adaptation. The proposed methodology is applicable when domain-specific data is available in language X only, whereas the goal is to develop a domain-specific system in language Y. Two semi-supervised methods are introduced and compared with a fully unsupervised approach, which represents the baseline. While both unsupervised and semi-supervised approaches allow to quickly develop an accurate domain-specific ASR system, the semi-supervised approaches overpass the unsupervised one by 10% to 29% relative, depending on the amount of human post-processed data available. An in-depth analysis, to explain how the machine translated text improves the performance of the domain-specific ASR, is also given at the end of this paper.
本研究探讨了机器翻译文本在ASR领域自适应中的应用。所提出的方法适用于仅以X语言提供特定领域数据的情况,而目标是用y语言开发特定领域系统。本文介绍了两种半监督方法,并将其与代表基线的完全无监督方法进行了比较。尽管无监督方法和半监督方法都可以快速开发出精确的特定领域ASR系统,但半监督方法相对于无监督方法高出10%至29%,具体取决于可用的人工后处理数据量。本文最后还深入分析了机器翻译文本如何提高特定领域ASR的性能。
{"title":"Investigating the role of machine translated text in ASR domain adaptation: Unsupervised and semi-supervised methods","authors":"H. Cucu, L. Besacier, C. Burileanu, Andi Buzo","doi":"10.1109/ASRU.2011.6163941","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163941","url":null,"abstract":"This study investigates the use of machine translated text for ASR domain adaptation. The proposed methodology is applicable when domain-specific data is available in language X only, whereas the goal is to develop a domain-specific system in language Y. Two semi-supervised methods are introduced and compared with a fully unsupervised approach, which represents the baseline. While both unsupervised and semi-supervised approaches allow to quickly develop an accurate domain-specific ASR system, the semi-supervised approaches overpass the unsupervised one by 10% to 29% relative, depending on the amount of human post-processed data available. An in-depth analysis, to explain how the machine translated text improves the performance of the domain-specific ASR, is also given at the end of this paper.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115951310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Wizard of Oz evaluation of listening-oriented dialogue control using POMDP 《绿野仙踪》中使用POMDP对听力导向对话控制的评价
Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163951
Toyomi Meguro, Yasuhiro Minami, Ryuichiro Higashinaka, Kohji Dohsaka
We have been working on dialogue control for listening agents. In our previous study [1], we proposed a dialogue control method that maximizes user satisfaction using partially observable Markov decision processes (POMDPs) and evaluated it by a dialogue simulation. We found that it significantly outperforms other stochastic dialogue control methods. However, this result does not necessarily mean that our method works as well in real dialogues with human users. Therefore, in this paper, we evaluate our dialogue control method by a Wizard of Oz (WoZ) experiment. The experimental results show that our POMDP-based method achieves significantly higher user satisfaction than other stochastic models, confirming the validity of our approach. This paper is the first to show the usefulness of POMDP-based dialogue control using human users when the target function is to maximize user satisfaction.
我们一直在研究监听代理的对话控制。在我们之前的研究[1]中,我们提出了一种使用部分可观察马尔可夫决策过程(pomdp)最大化用户满意度的对话控制方法,并通过对话模拟对其进行了评估。我们发现它明显优于其他随机对话控制方法。然而,这个结果并不一定意味着我们的方法在与人类用户的真实对话中也同样有效。因此,在本文中,我们通过绿野仙踪(WoZ)实验来评估我们的对话控制方法。实验结果表明,基于pomdp的方法获得的用户满意度明显高于其他随机模型,验证了该方法的有效性。本文首次展示了当目标功能是最大化用户满意度时,基于pomdp的对话控制使用人类用户的有用性。
{"title":"Wizard of Oz evaluation of listening-oriented dialogue control using POMDP","authors":"Toyomi Meguro, Yasuhiro Minami, Ryuichiro Higashinaka, Kohji Dohsaka","doi":"10.1109/ASRU.2011.6163951","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163951","url":null,"abstract":"We have been working on dialogue control for listening agents. In our previous study [1], we proposed a dialogue control method that maximizes user satisfaction using partially observable Markov decision processes (POMDPs) and evaluated it by a dialogue simulation. We found that it significantly outperforms other stochastic dialogue control methods. However, this result does not necessarily mean that our method works as well in real dialogues with human users. Therefore, in this paper, we evaluate our dialogue control method by a Wizard of Oz (WoZ) experiment. The experimental results show that our POMDP-based method achieves significantly higher user satisfaction than other stochastic models, confirming the validity of our approach. This paper is the first to show the usefulness of POMDP-based dialogue control using human users when the target function is to maximize user satisfaction.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127370493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
2011 IEEE Workshop on Automatic Speech Recognition & Understanding
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1