2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)最新文献

英文中文

Soundbite identification using reference and automatic transcripts of broadcast news speech 使用参考和自动抄本的广播新闻讲话片段识别

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430189

F. Liu, Yang Liu

Soundbite identification in broadcast news is important for locating information useful for question answering, mining opinions of a particular person, and enriching speech recognition output with quotation marks. This paper presents a systematic study of this problem under a classification framework, including problem formulation for classification, feature extraction, and the effect of using automatic speech recognition (ASR) output and automatic sentence boundary detection. Our experiments on a Mandarin broadcast news speech corpus show that the three-way classification framework outperforms the binary classification. The entropy-based feature weighting method generally performs better than others. Using ASR output degrades system performance, with more degradation observed from using automatic sentence segmentation than speech recognition errors for this task, especially on the recall rate.

广播新闻中的片段识别对于定位有用的问答信息、挖掘特定人物的观点以及丰富带引号的语音识别输出具有重要意义。本文在分类框架下对该问题进行了系统的研究，包括分类问题的提出、特征提取以及使用自动语音识别输出和自动句子边界检测的效果。我们在一个普通话广播新闻语音语料库上的实验表明，三向分类框架优于二元分类。基于熵的特征加权方法通常比其他方法性能更好。使用ASR输出会降低系统性能，在这个任务中，使用自动句子切分比语音识别错误导致的性能下降更大，尤其是在召回率上。

引用次数: 8

Robust speech recognition with on-line unsupervised acoustic feature compensation 基于在线无监督声学特征补偿的鲁棒语音识别

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430092

L. Buera, A. Miguel, EDUARDO LLEIDA SOLANO, Oscar Saz-Torralba, A. Ortega

An on-line unsupervised hybrid compensation technique is proposed to reduce the mismatch between training and testing conditions. It combines multi-environment model based linear normalization with cross-probability model based on GMMs (MEMLIN CPM) with a novel acoustic model adaptation method based on rotation transformations. Hence, a set of rotation transformations is estimated with clean and MEMLIN CPM-normalized training data by linear regression in an unsupervised process. Thus, in testing, each MEMLIN CPM normalized frame is decoded using a modified Viterbi algorithm and expanded acoustic models, which are obtained from the reference ones and the set of rotation transformations. To test the proposed solution, some experiments with Spanish SpeechDat Car database were carried out. MEMLIN CPM over standard ETSI front-end parameters reaches 83.89% of average improvement in WER, while the introduced hybrid solution goes up to 92.07%. Also, the proposed hybrid technique was tested with Aurora 2 database, obtaining an average improvement of 68.88% with clean training.

为了减少训练条件和测试条件之间的不匹配，提出了一种在线无监督混合补偿技术。将基于多环境模型的线性归一化与基于GMMs的交叉概率模型(MEMLIN CPM)相结合，提出了一种基于旋转变换的声学模型自适应方法。因此，在无监督过程中，使用clean和MEMLIN cpm归一化训练数据通过线性回归估计一组旋转变换。因此，在测试中，使用改进的Viterbi算法和扩展的声学模型对每个MEMLIN CPM归一化帧进行解码，这些声学模型是由参考模型和旋转变换集获得的。为了验证所提出的解决方案，在西班牙语语音数据库上进行了一些实验。MEMLIN在标准ETSI前端参数上的CPM达到了WER平均改进的83.89%，而引入混合方案的CPM达到了92.07%。在Aurora 2数据库中进行了混合技术的测试，经过清洁训练，平均提高了68.88%。

{"title":"Robust speech recognition with on-line unsupervised acoustic feature compensation","authors":"L. Buera, A. Miguel, EDUARDO LLEIDA SOLANO, Oscar Saz-Torralba, A. Ortega","doi":"10.1109/ASRU.2007.4430092","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430092","url":null,"abstract":"An on-line unsupervised hybrid compensation technique is proposed to reduce the mismatch between training and testing conditions. It combines multi-environment model based linear normalization with cross-probability model based on GMMs (MEMLIN CPM) with a novel acoustic model adaptation method based on rotation transformations. Hence, a set of rotation transformations is estimated with clean and MEMLIN CPM-normalized training data by linear regression in an unsupervised process. Thus, in testing, each MEMLIN CPM normalized frame is decoded using a modified Viterbi algorithm and expanded acoustic models, which are obtained from the reference ones and the set of rotation transformations. To test the proposed solution, some experiments with Spanish SpeechDat Car database were carried out. MEMLIN CPM over standard ETSI front-end parameters reaches 83.89% of average improvement in WER, while the introduced hybrid solution goes up to 92.07%. Also, the proposed hybrid technique was tested with Aurora 2 database, obtaining an average improvement of 68.88% with clean training.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133609156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

A multi-layer architecture for semi-synchronous event-driven dialogue management 用于半同步事件驱动对话管理的多层体系结构

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430165

Antoine Raux, M. Eskénazi

We present a new architecture for spoken dialogue systems that explicitly separates the discrete, abstract representation used in the high-level dialogue manager and the continuous, real-time nature of real world events. We propose to use the concept of conversational floor as a means to synchronize the internal state of the dialogue manager with the real world. To act as the interface between these two layers, we introduce a new component, called the Interaction Manager. The proposed architecture was implemented as a new version of the Olympus framework, which can be used across different domains and modalities. We confirmed the practicality of the approach by porting Let's Go, an existing deployed dialogue system to the new architecture.

我们提出了一种新的口语对话系统架构，它明确地将高层对话管理器中使用的离散、抽象表示与现实世界事件的连续、实时性质分离开来。我们建议使用会话层的概念作为将对话管理器的内部状态与现实世界同步的一种手段。为了充当这两层之间的接口，我们引入了一个称为交互管理器的新组件。提出的体系结构是作为Olympus框架的新版本实现的，它可以跨不同的领域和模式使用。我们通过将Let’s Go(一个现有的已部署对话系统)移植到新架构中来确认该方法的实用性。

引用次数: 49

Implicit user-adaptive system engagement in speech, pen and multimodal interfaces 语音、笔和多模态界面中隐含的用户自适应系统参与

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430162

S. Oviatt

The present research contributes new empirical research, theory, and prototyping toward developing implicit user-adaptive techniques for system engagement based exclusively on speech amplitude and pen pressure. The results reveal that people will spontaneously adapt their communicative energy level reliably, substantially, and in different modalities to designate and repair an intended interlocutor in a computer-mediated group setting. Furthermore, this sole behavior can be harnessed to achieve system engagement accuracies in the 75 - 86 % range. In short, there was a high level of correct system engagement based exclusively on implicit cues in users' energy level during communication.

目前的研究为开发完全基于语音振幅和笔压的系统参与的隐含用户自适应技术提供了新的实证研究、理论和原型。结果表明，在计算机媒介的群体环境中，人们会自发地、可靠地、大量地、以不同的方式调整自己的交际能量水平，以指定和修复预期的对话者。此外，这种单一的行为可以用来实现75% - 86%范围内的系统接合精度。简而言之，在交流过程中，用户能量水平的隐式提示会产生高水平的正确系统参与。

引用次数: 2

Analytical comparison between position specific posterior lattices and confusion networks based on words and subword units for spoken document indexing 基于词和子词单位的位置后验格与混淆网络在口语文档索引中的分析比较

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430193

Yi-Cheng Pan, Hung-lin Chang, Lin-Shan Lee

In this paper we analytically compare the two widely accepted approaches of spoken document indexing, position specific posterior lattices (PSPL) and confusion network (CN), in terms of retrieval accuracy and index size. The fundamental distinctions between these two approaches in terms of construction units, posterior probabilities, number of clusters, indexing coverage and space requirements are discussed in detail. A new approach to approximate subword posterior probability in a word lattice is also incorporated in PSPL/CN to handle OOV/rare word problems, which were unaddressed in original PSPL and CN approaches. Extensive experimental results on Chinese broadcast news segments indicate that PSPL offers higher accuracy than CN but requiring much larger disk space, while subword-based PSPL turns out to be very attractive because it lowers the storage cost while offers even higher accuracies.

在本文中，我们分析比较了两种被广泛接受的口语文献索引方法，定位后验格(PSPL)和混淆网络(CN)，在检索精度和索引大小方面。详细讨论了这两种方法在构造单元、后验概率、聚类数量、索引覆盖范围和空间要求方面的基本区别。在PSPL/CN中引入了一种新的近似词格中子词后验概率的方法来处理原始PSPL和CN方法未解决的OOV/罕见词问题。在中文广播新闻片段上的大量实验结果表明，PSPL比CN具有更高的准确性，但需要更大的磁盘空间，而基于子词的PSPL则非常有吸引力，因为它降低了存储成本，同时提供了更高的准确性。

引用次数: 26

Type-II dialogue systems for information access from unstructured knowledge sources 用于从非结构化知识来源获取信息的第二类对话系统

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430170

Yi-Cheng Pan, Lin-Shan Lee

In this paper, we present a new formulation and a new framework for a new type of dialogue system, referred to as the type-II dialogue systems in this paper. The distinct feature of such dialogue systems is their tasks of information access from unstructured knowledge sources, or the lack of a well-organized back-end database offering the information for the user. Typical example tasks of this type of dialogue systems include retrieval, browsing and question answering. The mainstream dialogue systems with a well-organized back-end database are then referred to as type-I dialogue systems here in the paper. The functionalities of each module in such type-II dialogue systems are analyzed, presented, and compared with the respective modules in type-I dialogue systems. A preliminary type-II dialogue system recently developed in National Taiwan University is also presented at the end as a typical example.

本文提出了一种新型对话系统的新提法和新框架，本文称之为ii型对话系统。这种对话系统的显著特点是它们的任务是从非结构化的知识来源获取信息，或者缺乏为用户提供信息的组织良好的后端数据库。这类对话系统的典型任务包括检索、浏览和问答。具有组织良好的后端数据库的主流对话系统在本文中被称为第一类对话系统。对第二类对话系统中各模块的功能进行了分析、介绍，并与第一类对话系统中各模块进行了比较。最后还以台大最近开发的一个初步二类对话系统为例进行了介绍。

引用次数: 8

Efficient combination of parametric spaces, models and metrics for speaker diarization1 参数空间、模型和度量的有效结合

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430120

Themos Stafylakis, V. Katsouros, G. Carayannis

In this paper we present a method of combining several acoustic parametric spaces, statistical models and distance metrics in speaker diarization task. Focusing our interest on the post-segmentation part of the problem, we adopt an incremental feature selection and fusion algorithm based on the Maximum Entropy Principle and Iterative Scaling Algorithm that combines several statistical distance measures on speech-chunk pairs. By this approach, we place the merging-of-chunks clustering process into a probabilistic framework. We also propose a decomposition of the input space according to gender, recording conditions and chunk lengths. The algorithm produced highly competitive results compared to GMM-UBM state-of-the-art methods.

本文提出了一种将几种声学参数空间、统计模型和距离度量相结合的方法。针对问题的后分割部分，我们采用了一种基于最大熵原理和迭代缩放算法的增量特征选择和融合算法，该算法结合了语音块对的几种统计距离度量。通过这种方法，我们将块合并聚类过程置于概率框架中。我们还建议根据性别、记录条件和块长度对输入空间进行分解。与GMM-UBM最先进的方法相比，该算法产生了极具竞争力的结果。

引用次数: 0

A Mandarin lecture speech transcription system for speech summarization 基于语音摘要的普通话讲座语音转录系统

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430157

R. Chan, J. Zhang, Pascale Fung, Lu Cao

This paper introduces our work on mandarin lecture speech transcription. In particular, we present our work on a small database, which contains only 16 hours of audio data and 0.16 M words of text data. A range of experiments have been done to improve the performances of the acoustic model and the language model, these include adapting the lecture speech data to the reading speech data for acoustic modeling and the use of lecture conference paper, power points and similar domain web data for language modeling. We also study the effects of automatic segmentation, unsupervised acoustic model adaptation and language model adaptation in our recognition system. By using a 3timesRT multiple passes decoding strategy, we obtain 70.3% accuracy performance in our final system. Finally, we apply our speech transcription system into a SVM summarizer and obtain a ROUGE-L F-measure of 66.5%.

本文介绍了我们在普通话讲稿抄写方面的工作。特别地，我们在一个小型数据库上展示了我们的工作，该数据库仅包含16小时的音频数据和0.16 M个单词的文本数据。为了提高声学模型和语言模型的性能，我们进行了一系列的实验，包括将讲座语音数据与阅读语音数据进行声学建模，以及使用讲座会议论文、ppt和类似的领域web数据进行语言建模。我们还研究了自动分割、无监督声学模型自适应和语言模型自适应在我们的识别系统中的效果。通过使用3 timesrt多个经过解码策略,我们在最终系统性能获得70.3%的准确率。最后，我们将我们的语音转录系统应用到支持向量机摘要器中，得到了66.5%的ROUGE-L f测度。

引用次数: 5

The GALE project: A description and an update GALE项目:描述和更新

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430115

Jordan Cohen

Summary form only given. The GALE (global autonomous language exploitation) program is a DARPA program to develop and apply computer software technologies to absorb, translate, analyze, and interpret huge volumes of speech and text in multiple languages This program has been active for two years, and the GALE contractors have been engaged in developing highly robust speech recognition, machine translation, and information delivery systems in Chinese and Arabic. Several GALE-developed talks will be given in this workshop. This overview talk will review the program goals, the technical highlights, and the technical issues remaining in the GALE project.

只提供摘要形式。GALE(全球自主语言开发)项目是DARPA的一个项目，旨在开发和应用计算机软件技术，以吸收、翻译、分析和解释多种语言的大量语音和文本。该项目已经活跃了两年，GALE承包商一直致力于开发高度健壮的中文和阿拉伯语语音识别、机器翻译和信息传递系统。该研讨会将举行几场gale开发的讲座。这次概览演讲将回顾GALE项目的项目目标、技术亮点和遗留的技术问题。

引用次数: 18

Spoken document summarization using relevant information 使用相关信息对口头文件进行总结

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430107

Yi-Ting Chen, Shih-Hsiang Lin, H. Wang, Berlin Chen

Extractive summarization usually automatically selects indicative sentences from a document according to a certain target summarization ratio, and then sequences them to form a summary. In this paper, we investigate the use of information from relevant documents retrieved from a contemporary text collection for each sentence of a spoken document to be summarized in a probabilistic generative framework for extractive spoken document summarization. In the proposed methods, the probability of a document being generated by a sentence is modeled by a hidden Markov model (HMM), while the retrieved relevant text documents are used to estimate the HMM's parameters and the sentence's prior probability. The results of experiments on Chinese broadcast news compiled in Taiwan show that the new methods outperform the previous HMM approach.

摘要抽取通常是按照一定的目标摘要比例自动从文献中选取指示句，然后对其进行排序，形成摘要。在本文中，我们研究了从当代文本集中检索的相关文档信息的使用，这些信息来自口语文档的每个句子，并在抽取口语文档摘要的概率生成框架中进行总结。该方法利用隐马尔可夫模型(HMM)对句子生成文档的概率进行建模，并利用检索到的相关文本文档来估计隐马尔可夫模型的参数和句子的先验概率。在台湾对中文广播新闻进行的实验结果表明，新方法优于之前的HMM方法。

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀