首页 > 最新文献

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)最新文献

英文 中文
Robust speech recognition by properly utilizing reliable frames and segments in corrupted signals 通过在损坏信号中适当地利用可靠的帧和段来实现鲁棒语音识别
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430091
Yi Chen, C. Wan, Lin-Shan Lee
In this paper, we propose a new approach to detecting and utilizing reliable frames and segments in corrupted signals for robust speech recognition. Novel approaches to estimating an energy-based measure and a harmonicity measure for each frame are developed. SNR-dependent GMM classifiers are then trained, together with a reliable frame selection and clustering module and a reliable segment identification module, to detect the most reliable frames in an utterance. These reliable frames and segments thus obtained can be properly used in both front-end feature enhancement and back-end Viterbi decoding. In the extensive experiments reported here, very significant improvements in recognition accuracies were obtained with the proposed approaches for all types of noise and all SNR values defined in the Aurora 2 database.
在本文中,我们提出了一种新的方法来检测和利用可靠的帧和段在损坏的信号鲁棒语音识别。提出了一种新的方法来估计基于能量的度量和每帧的谐波度量。然后训练依赖于信噪比的GMM分类器,以及可靠的帧选择和聚类模块和可靠的片段识别模块,以检测话语中最可靠的帧。这些可靠的帧和段既可以用于前端特征增强,也可以用于后端Viterbi解码。在这里报道的大量实验中,对于极光2号数据库中定义的所有类型的噪声和所有信噪比值,所提出的方法都获得了非常显著的识别精度提高。
{"title":"Robust speech recognition by properly utilizing reliable frames and segments in corrupted signals","authors":"Yi Chen, C. Wan, Lin-Shan Lee","doi":"10.1109/ASRU.2007.4430091","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430091","url":null,"abstract":"In this paper, we propose a new approach to detecting and utilizing reliable frames and segments in corrupted signals for robust speech recognition. Novel approaches to estimating an energy-based measure and a harmonicity measure for each frame are developed. SNR-dependent GMM classifiers are then trained, together with a reliable frame selection and clustering module and a reliable segment identification module, to detect the most reliable frames in an utterance. These reliable frames and segments thus obtained can be properly used in both front-end feature enhancement and back-end Viterbi decoding. In the extensive experiments reported here, very significant improvements in recognition accuracies were obtained with the proposed approaches for all types of noise and all SNR values defined in the Aurora 2 database.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128143703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Building a highly accurate Mandarin speech recognizer 构建高精度的普通话语音识别器
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430161
M. Hwang, Gang Peng, Wen Wang, Arlo Faria, A. Heidel, Mari Ostendorf
We describe a highly accurate large-vocabulary continuous Mandarin speech recognizer, a collaborative effort among four research organizations. Particularly, we build two acoustic models (AMs) with significant differences but similar accuracy for the purposes of cross adaptation and system combination. This paper elaborates on the main differences between the two systems, where one recognizer incorporates a discriminatively trained feature while the other utilizes a discriminative feature transformation. Additionally we present an improved acoustic segmentation algorithm and topic-based language model (LM) adaptation. Coupled with increased acoustic training data, we reduced the character error rate (CER) of the DARPA GALE 2006 evaluation set to 15.3% from 18.4%.
我们描述了一个高度精确的大词汇连续普通话语音识别器,这是四个研究机构的合作成果。特别地,我们建立了两种具有显著差异但精度相近的声学模型(AMs),用于交叉适应和系统组合。本文详细阐述了两个系统之间的主要区别,其中一个识别器包含判别训练特征,而另一个识别器使用判别特征转换。此外,我们提出了一种改进的声学分割算法和基于主题的语言模型(LM)自适应。再加上声学训练数据的增加,我们将DARPA GALE 2006评估集的字符错误率(CER)从18.4%降低到15.3%。
{"title":"Building a highly accurate Mandarin speech recognizer","authors":"M. Hwang, Gang Peng, Wen Wang, Arlo Faria, A. Heidel, Mari Ostendorf","doi":"10.1109/ASRU.2007.4430161","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430161","url":null,"abstract":"We describe a highly accurate large-vocabulary continuous Mandarin speech recognizer, a collaborative effort among four research organizations. Particularly, we build two acoustic models (AMs) with significant differences but similar accuracy for the purposes of cross adaptation and system combination. This paper elaborates on the main differences between the two systems, where one recognizer incorporates a discriminatively trained feature while the other utilizes a discriminative feature transformation. Additionally we present an improved acoustic segmentation algorithm and topic-based language model (LM) adaptation. Coupled with increased acoustic training data, we reduced the character error rate (CER) of the DARPA GALE 2006 evaluation set to 15.3% from 18.4%.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115751332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Minimum mutual information beamforming for simultaneous active speakers 同时有源说话者的最小互信息波束形成
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430086
K. Kumatani, U. Mayer, Tobias Gehrig, Emilian Stoimenov, J. McDonough, Matthias Wölfel
In this work, we address an acoustic beamforming application where two speakers are simultaneously active. We construct one subband domain beamformer in generalized sidelobe canceller (GSC) configuration for each source. In contrast to normal practice, we then jointly adjust the active weight vectors of both GSCs to obtain two output signals with minimum mutual information (MMI). In order to calculate the mutual information of the complex subband snapshots, we consider four probability density functions (pdfs), namely the Gaussian, Laplace, K0 and lceil pdfs. The latter three belong to the class of super-Gaussian density functions that are typically used in independent component analysis as opposed to conventional beam-forming. We demonstrate the effectiveness of our proposed technique through a series of far-field automatic speech recognition experiments on data from the PASCAL Speech Separation Challenge. In the experiments, the delay-and-sum beamformer achieved a word error rate (WER) of 70.4 %. The MMI beamformer under a Gaussian assumption achieved 55.2 % WER which was further reduced to 52.0 % with a K0 pdf, whereas the WER for data recorded with close-talking microphone was 21.6 %.
在这项工作中,我们解决了一个声学波束成形应用,其中两个扬声器同时活跃。我们为每个源构造了一个广义旁瓣对消(GSC)配置的子带域波束形成器。与通常做法相反,我们然后共同调整两个GSCs的主动权重向量,以获得具有最小互信息(MMI)的两个输出信号。为了计算复杂子带快照的互信息,我们考虑了四种概率密度函数(pdfs),即高斯、拉普拉斯、K0和lceil pdfs。后三个属于超高斯密度函数类,通常用于独立分量分析,而不是传统的波束形成。我们通过对PASCAL语音分离挑战赛的数据进行一系列远场自动语音识别实验,证明了我们提出的技术的有效性。在实验中,延迟和波束形成器的字错误率(WER)达到70.4%。在高斯假设下,MMI波束形成器的噪声比达到55.2%,在K0 pdf下进一步降低到52.0%,而在近距离传声器记录数据时,噪声比为21.6%。
{"title":"Minimum mutual information beamforming for simultaneous active speakers","authors":"K. Kumatani, U. Mayer, Tobias Gehrig, Emilian Stoimenov, J. McDonough, Matthias Wölfel","doi":"10.1109/ASRU.2007.4430086","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430086","url":null,"abstract":"In this work, we address an acoustic beamforming application where two speakers are simultaneously active. We construct one subband domain beamformer in generalized sidelobe canceller (GSC) configuration for each source. In contrast to normal practice, we then jointly adjust the active weight vectors of both GSCs to obtain two output signals with minimum mutual information (MMI). In order to calculate the mutual information of the complex subband snapshots, we consider four probability density functions (pdfs), namely the Gaussian, Laplace, K0 and lceil pdfs. The latter three belong to the class of super-Gaussian density functions that are typically used in independent component analysis as opposed to conventional beam-forming. We demonstrate the effectiveness of our proposed technique through a series of far-field automatic speech recognition experiments on data from the PASCAL Speech Separation Challenge. In the experiments, the delay-and-sum beamformer achieved a word error rate (WER) of 70.4 %. The MMI beamformer under a Gaussian assumption achieved 55.2 % WER which was further reduced to 52.0 % with a K0 pdf, whereas the WER for data recorded with close-talking microphone was 21.6 %.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129399289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Development of the 2007 RWTH Mandarin LVCSR system 2007年工业大学文华LVCSR系统的发展
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430155
Björn Hoffmeister, Christian Plahl, P. Fritz, G. Heigold, J. Lööf, R. Schlüter, H. Ney
This paper describes the development of the RWTH Mandarin LVCSR system. Different acoustic front-ends together with multiple system cross-adaptation are used in a two stage decoding framework. We describe the system in detail and present systematic recognition results. Especially, we compare a variety of approaches for cross-adapting to multiple systems. During the development we did a comparative study on different methods for integrating tone and phoneme posterior features. Furthermore, we apply lattice based consensus decoding and system combination methods. In these methods, the effect of minimizing character instead of word errors is compared. The final system obtains a character error rate of 17.7% on the GALE 2006 evaluation data.
本文介绍了工大文华LVCSR系统的研制过程。在两级解码框架中使用了不同的声学前端和多系统交叉适应。详细描述了该系统,并给出了系统识别结果。特别是,我们比较了交叉适应多个系统的各种方法。在开发过程中,我们对不同的音素后特征整合方法进行了比较研究。此外,我们还应用了基于格的一致译码和系统组合方法。在这些方法中,比较了最小化字符错误而不是最小化单词错误的效果。最终系统在GALE 2006评价数据上的字符错误率为17.7%。
{"title":"Development of the 2007 RWTH Mandarin LVCSR system","authors":"Björn Hoffmeister, Christian Plahl, P. Fritz, G. Heigold, J. Lööf, R. Schlüter, H. Ney","doi":"10.1109/ASRU.2007.4430155","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430155","url":null,"abstract":"This paper describes the development of the RWTH Mandarin LVCSR system. Different acoustic front-ends together with multiple system cross-adaptation are used in a two stage decoding framework. We describe the system in detail and present systematic recognition results. Especially, we compare a variety of approaches for cross-adapting to multiple systems. During the development we did a comparative study on different methods for integrating tone and phoneme posterior features. Furthermore, we apply lattice based consensus decoding and system combination methods. In these methods, the effect of minimizing character instead of word errors is compared. The final system obtains a character error rate of 17.7% on the GALE 2006 evaluation data.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132294726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Adapting grapheme-to-phoneme conversion for name recognition 适应字素到音素的名称识别转换
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430097
Xiao Li, A. Gunawardana, A. Acero
This work investigates the use of acoustic data to improve grapheme-to-phoneme conversion for name recognition. We introduce a joint model of acoustics and graphonemes, and present two approaches, maximum likelihood training and discriminative training, in adapting graphoneme model parameters. Experiments on a large-scale voice-dialing system show that the maximum likelihood approach yields a relative 7% reduction in SER compared to the best baseline result we obtained without leveraging acoustic data, while discriminative training enlarges the SER reduction to 12%.
这项工作研究了声学数据的使用,以提高字素到音素的转换,以进行名称识别。我们引入了声学和字素的联合模型,并提出了最大似然训练和判别训练两种方法来适应字素模型参数。在大规模语音拨号系统上的实验表明,与我们在不利用声学数据的情况下获得的最佳基线结果相比,最大似然方法的SER降低了7%,而判别训练将SER降低了12%。
{"title":"Adapting grapheme-to-phoneme conversion for name recognition","authors":"Xiao Li, A. Gunawardana, A. Acero","doi":"10.1109/ASRU.2007.4430097","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430097","url":null,"abstract":"This work investigates the use of acoustic data to improve grapheme-to-phoneme conversion for name recognition. We introduce a joint model of acoustics and graphonemes, and present two approaches, maximum likelihood training and discriminative training, in adapting graphoneme model parameters. Experiments on a large-scale voice-dialing system show that the maximum likelihood approach yields a relative 7% reduction in SER compared to the best baseline result we obtained without leveraging acoustic data, while discriminative training enlarges the SER reduction to 12%.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115679615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Interpolative variable frame rate transmission of speech features for distributed speech recognition 分布式语音识别中语音特征的插值变帧率传输
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430179
Huiqun Deng, D. O'Shaughnessy, Jean-Guy Dahan, W. Ganong
In distributed speech recognition, vector quantization is used to reduce the number of bits for coding speech features at the user end in order to save energy for transmitting speech feature streams to remote recognizers and reduce data traffic congestion. We notice that the overall bit rate of the transmitted feature streams could be further reduced by not sending redundant frames that can be interpolated at the remote server from received frames. Interpolation introduces errors and may degrade speech recognition. This paper investigates the methods of selecting frames for transmission and the effect of interpolation on recognition. Experiments on a large vocabulary recognizer show that with spline interpolation, the overall frame rate for transmission can be reduced by about 50% with a relative increase in word error rate less than 5.2% for clean and noisy speech.
在分布式语音识别中,为了节省向远程识别器传输语音特征流的能量,减少数据流量拥塞,采用矢量量化方法减少用户端用于编码语音特征的比特数。我们注意到,传输特征流的总体比特率可以通过不发送冗余帧来进一步降低,这些冗余帧可以从接收到的帧中插值到远程服务器。插值引入错误,并可能降低语音识别。研究了传输帧的选择方法和插值对识别的影响。在一个大型词汇识别器上的实验表明,对于干净和有噪声的语音,使用样条插值可以将传输的整体帧率降低约50%,而单词错误率的相对增加不超过5.2%。
{"title":"Interpolative variable frame rate transmission of speech features for distributed speech recognition","authors":"Huiqun Deng, D. O'Shaughnessy, Jean-Guy Dahan, W. Ganong","doi":"10.1109/ASRU.2007.4430179","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430179","url":null,"abstract":"In distributed speech recognition, vector quantization is used to reduce the number of bits for coding speech features at the user end in order to save energy for transmitting speech feature streams to remote recognizers and reduce data traffic congestion. We notice that the overall bit rate of the transmitted feature streams could be further reduced by not sending redundant frames that can be interpolated at the remote server from received frames. Interpolation introduces errors and may degrade speech recognition. This paper investigates the methods of selecting frames for transmission and the effect of interpolation on recognition. Experiments on a large vocabulary recognizer show that with spline interpolation, the overall frame rate for transmission can be reduced by about 50% with a relative increase in word error rate less than 5.2% for clean and noisy speech.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"190 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124205742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Recognition and understanding of meetings the AMI and AMIDA projects 对AMI和AMIDA项目会议的认识和理解
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430116
S. Renals, Thomas Hain, H. Bourlard
The AMI and AMIDA projects are concerned with the recognition and interpretation of multiparty meetings. Within these projects we have: developed an infrastructure for recording meetings using multiple microphones and cameras; released a 100 hour annotated corpus of meetings; developed techniques for the recognition and interpretation of meetings based primarily on speech recognition and computer vision; and developed an evaluation framework at both component and system levels. In this paper we present an overview of these projects, with an emphasis on speech recognition and content extraction.
AMI和AMIDA项目涉及多方会议的承认和解释。在这些项目中,我们开发了一个使用多个麦克风和摄像机记录会议的基础设施;发布了长达100小时的带注释的会议文集;开发了主要基于语音识别和计算机视觉的会议识别和解释技术;并在组件和系统级别开发了一个评估框架。在本文中,我们概述了这些项目,重点是语音识别和内容提取。
{"title":"Recognition and understanding of meetings the AMI and AMIDA projects","authors":"S. Renals, Thomas Hain, H. Bourlard","doi":"10.1109/ASRU.2007.4430116","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430116","url":null,"abstract":"The AMI and AMIDA projects are concerned with the recognition and interpretation of multiparty meetings. Within these projects we have: developed an infrastructure for recording meetings using multiple microphones and cameras; released a 100 hour annotated corpus of meetings; developed techniques for the recognition and interpretation of meetings based primarily on speech recognition and computer vision; and developed an evaluation framework at both component and system levels. In this paper we present an overview of these projects, with an emphasis on speech recognition and content extraction.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115067448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 144
Voice/audio information retrieval: minimizing the need for human ears 语音/音频信息检索:尽量减少对人耳的需求
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430183
M. Clements, M. Gavaldà
This paper discusses the challenges of building information retrieval applications that operate on large amounts of voice/audio data. Various problems and issues are presented along with proposed solutions. A set of techniques based on a phonetic keyword spotting approach is presented, together with examples of concrete applications that solve real-life problems.
本文讨论了构建基于大量语音/音频数据的信息检索应用程序所面临的挑战。提出了各种各样的问题和建议的解决方案。本文介绍了一套基于语音关键字识别方法的技术,以及解决现实问题的具体应用实例。
{"title":"Voice/audio information retrieval: minimizing the need for human ears","authors":"M. Clements, M. Gavaldà","doi":"10.1109/ASRU.2007.4430183","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430183","url":null,"abstract":"This paper discusses the challenges of building information retrieval applications that operate on large amounts of voice/audio data. Various problems and issues are presented along with proposed solutions. A set of techniques based on a phonetic keyword spotting approach is presented, together with examples of concrete applications that solve real-life problems.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117317061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Unsupervised state clustering for stochastic dialog management 随机对话管理的无监督状态聚类
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430171
F. Lefèvre, R. Mori
Following recent studies in stochastic dialog management, this paper introduces an unsupervised approach aiming at reducing the cost and complexity for the setup of a probabilistic POMDP-based dialog manager. The proposed method is based on a first decoding step deriving semantic basic constituents from user utterances. These isolated units and some relevant context features (as previous system actions, previous user utterances...) are combined to form vectors representing the on-going dialog states. After a clustering step, each partition of this space is intented to represent a particular dialog state. Then any new utterance can be classified according to these automatic states and the belief state can be updated before the POMDP-based dialog manager can take a decision on the best next action to perform. The proposed approach is applied to the French media task (tourist information and hotel booking). The media 10k-utterance training corpus is semantically rich (over 80 basic concepts) and is segmentally annotated in terms of basic concepts. Before user trials can be carried out, some insights on the method effectiveness are obtained by analysis of the convergence of the POMDP models.
根据近年来在随机对话管理方面的研究,本文提出了一种无监督的方法,旨在降低基于概率pomdp的对话管理器的建立成本和复杂性。该方法基于从用户话语中提取语义基本成分的第一步解码。这些孤立的单元和一些相关的上下文特征(如以前的系统操作,以前的用户话语……)被组合成表示正在进行的对话状态的向量。在聚类步骤之后,这个空间的每个分区都打算表示一个特定的对话状态。然后,任何新的话语都可以根据这些自动状态进行分类,并且在基于pomdp的对话管理器决定下一步要执行的最佳操作之前,可以更新信念状态。所提出的方法适用于法国媒体任务(旅游信息和酒店预订)。媒体10k-话语训练语料库语义丰富(超过80个基本概念),并按基本概念分段标注。在进行用户试验之前,通过分析POMDP模型的收敛性,获得了对方法有效性的一些见解。
{"title":"Unsupervised state clustering for stochastic dialog management","authors":"F. Lefèvre, R. Mori","doi":"10.1109/ASRU.2007.4430171","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430171","url":null,"abstract":"Following recent studies in stochastic dialog management, this paper introduces an unsupervised approach aiming at reducing the cost and complexity for the setup of a probabilistic POMDP-based dialog manager. The proposed method is based on a first decoding step deriving semantic basic constituents from user utterances. These isolated units and some relevant context features (as previous system actions, previous user utterances...) are combined to form vectors representing the on-going dialog states. After a clustering step, each partition of this space is intented to represent a particular dialog state. Then any new utterance can be classified according to these automatic states and the belief state can be updated before the POMDP-based dialog manager can take a decision on the best next action to perform. The proposed approach is applied to the French media task (tourist information and hotel booking). The media 10k-utterance training corpus is semantically rich (over 80 basic concepts) and is segmentally annotated in terms of basic concepts. Before user trials can be carried out, some insights on the method effectiveness are obtained by analysis of the convergence of the POMDP models.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116309298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Speech enhancement using PCA and variance of the reconstruction error in distributed speech recognition 分布式语音识别中基于PCA和重构误差方差的语音增强
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430077
Amin Haji Abolhassani, S. Selouani, D. O'Shaughnessy
We present in this paper a signal subspace-based approach for enhancing a noisy signal. This algorithm is based on a principal component analysis (PCA) in which the optimal sub-space selection is provided by a variance of the reconstruction error (VRE) criterion. This choice overcomes many limitations encountered with other selection criteria, like over-estimation of the signal subspace or the need for empirical parameters. We have also extended our subspace algorithm to take into account the case of colored and babble noise. The performance evaluation, which is made on the Aurora database, measures improvements in the distributed speech recognition of noisy signals corrupted by different types of additive noises. Our algorithm succeeds in improving the recognition of noisy speech in all noisy conditions.
本文提出了一种基于信号子空间的增强噪声信号的方法。该算法基于主成分分析(PCA),其中最优子空间选择是由重构误差(VRE)标准的方差提供的。这种选择克服了其他选择标准遇到的许多限制,如对信号子空间的过度估计或对经验参数的需要。我们还扩展了子空间算法,以考虑有色噪声和杂音的情况。在Aurora数据库上进行了性能评估,测量了不同类型的加性噪声破坏的分布式语音识别信号的改进。我们的算法成功地提高了在所有噪声条件下对噪声语音的识别。
{"title":"Speech enhancement using PCA and variance of the reconstruction error in distributed speech recognition","authors":"Amin Haji Abolhassani, S. Selouani, D. O'Shaughnessy","doi":"10.1109/ASRU.2007.4430077","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430077","url":null,"abstract":"We present in this paper a signal subspace-based approach for enhancing a noisy signal. This algorithm is based on a principal component analysis (PCA) in which the optimal sub-space selection is provided by a variance of the reconstruction error (VRE) criterion. This choice overcomes many limitations encountered with other selection criteria, like over-estimation of the signal subspace or the need for empirical parameters. We have also extended our subspace algorithm to take into account the case of colored and babble noise. The performance evaluation, which is made on the Aurora database, measures improvements in the distributed speech recognition of noisy signals corrupted by different types of additive noises. Our algorithm succeeds in improving the recognition of noisy speech in all noisy conditions.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"163 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123422421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1