首页 > 最新文献

2008 IEEE Spoken Language Technology Workshop最新文献

英文 中文
Speech synthesis using approximate matching of syllables 语音合成使用近似匹配的音节
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777834
E. V. Raghavendra, B. Yegnanarayana, K. Prahallad
In this paper we propose a technique for a syllable based speech synthesis system. While syllable based synthesizers produce better sounding speech than diphone and phone, the coverage of all syllables is a non-trivial issue. We address the issue of coverage of syllables through approximating the syllable when the required syllable is not found. To verify our hypothesis, we conducted perceptual studies on manually modified sentences and found that our assumption is valid. Similar approaches have been used in speech synthesis and it shows that such approximation produces intelligible and better quality speech than diphone units.
本文提出了一种基于音节的语音合成系统。虽然基于音节的合成器比diphone和phone发出更好的声音,但所有音节的覆盖是一个重要的问题。当没有找到所需的音节时,我们通过近似音节来解决音节覆盖的问题。为了验证我们的假设,我们对人工修改的句子进行了感知研究,发现我们的假设是有效的。类似的方法也被用于语音合成,结果表明,这种近似产生的语音比双phone单元更清晰、质量更好。
{"title":"Speech synthesis using approximate matching of syllables","authors":"E. V. Raghavendra, B. Yegnanarayana, K. Prahallad","doi":"10.1109/SLT.2008.4777834","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777834","url":null,"abstract":"In this paper we propose a technique for a syllable based speech synthesis system. While syllable based synthesizers produce better sounding speech than diphone and phone, the coverage of all syllables is a non-trivial issue. We address the issue of coverage of syllables through approximating the syllable when the required syllable is not found. To verify our hypothesis, we conducted perceptual studies on manually modified sentences and found that our assumption is valid. Similar approaches have been used in speech synthesis and it shows that such approximation produces intelligible and better quality speech than diphone units.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115443001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
A similar content retrieval method for podcast episodes 播客集的类似内容检索方法
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777899
Junta Mizuno, J. Ogata, Masataka Goto
Given podcasts (audio blogs) that are sets of speech files called episodes, this paper describes a method for retrieving episodes that have similar content. Although most previous retrieval methods were based on bibliographic information, tags, or users' playback behaviors without considering spoken content, our method can compute content-based similarity based on speech recognition results of podcast episodes even if the recognition results include some errors. To overcome those errors, it converts intermediate speech-recognition results to a confusion network containing competitive candidates, and then computes the similarity by using keywords extracted from the network. Experimental results with episodes that have different word accuracy and content showed that keywords obtained from competitive candidates were useful in retrieving similar episodes. To show relevant episodes, our method will be incorporated into PodCastle, a public web service that provides full-text searching of podcasts on the basis of speech recognition.
鉴于播客(音频博客)是一组称为剧集的语音文件,本文描述了一种检索具有相似内容的剧集的方法。虽然以前的大多数检索方法是基于书目信息、标签或用户的播放行为而不考虑语音内容,但我们的方法可以基于播客集的语音识别结果计算基于内容的相似度,即使识别结果存在一些错误。为了克服这些错误,它将中间语音识别结果转换为包含竞争候选人的混淆网络,然后使用从网络中提取的关键字计算相似度。实验结果表明,从竞争候选词中获得的关键词在检索相似的剧集时是有用的。为了显示相关的剧集,我们的方法将被整合到PodCastle中。PodCastle是一个公共网络服务,提供基于语音识别的播客全文搜索。
{"title":"A similar content retrieval method for podcast episodes","authors":"Junta Mizuno, J. Ogata, Masataka Goto","doi":"10.1109/SLT.2008.4777899","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777899","url":null,"abstract":"Given podcasts (audio blogs) that are sets of speech files called episodes, this paper describes a method for retrieving episodes that have similar content. Although most previous retrieval methods were based on bibliographic information, tags, or users' playback behaviors without considering spoken content, our method can compute content-based similarity based on speech recognition results of podcast episodes even if the recognition results include some errors. To overcome those errors, it converts intermediate speech-recognition results to a confusion network containing competitive candidates, and then computes the similarity by using keywords extracted from the network. Experimental results with episodes that have different word accuracy and content showed that keywords obtained from competitive candidates were useful in retrieving similar episodes. To show relevant episodes, our method will be incorporated into PodCastle, a public web service that provides full-text searching of podcasts on the basis of speech recognition.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125146285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Evaluating the effectiveness of features and sampling in extractive meeting summarization 评估特征和采样在抽取会议总结中的有效性
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777864
Shasha Xie, Yang Liu, Hui-Ching Lin
Feature-based approaches are widely used in the task of extractive meeting summarization. In this paper, we analyze and evaluate the effectiveness of different types of features using forward feature selection in an SVM classifier. In addition to features used in prior studies, we introduce topic related features and demonstrate that these features are helpful for meeting summarization. We also propose a new way to resample the sentences based on their salience scores for model training and testing. The experimental results on both the human transcripts and recognition output, evaluated by the ROUGE summarization metrics, show that feature selection and data resampling help improve the system performance.
基于特征的方法被广泛应用于抽取会议摘要任务中。在本文中,我们分析和评估了在支持向量机分类器中使用前向特征选择不同类型特征的有效性。除了先前研究中使用的特征外,我们还引入了与主题相关的特征,并证明这些特征有助于会议总结。我们还提出了一种基于显著性分数对句子进行重新采样的新方法,用于模型训练和测试。通过ROUGE汇总指标对人类转录本和识别输出的实验结果进行评估,表明特征选择和数据重采样有助于提高系统性能。
{"title":"Evaluating the effectiveness of features and sampling in extractive meeting summarization","authors":"Shasha Xie, Yang Liu, Hui-Ching Lin","doi":"10.1109/SLT.2008.4777864","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777864","url":null,"abstract":"Feature-based approaches are widely used in the task of extractive meeting summarization. In this paper, we analyze and evaluate the effectiveness of different types of features using forward feature selection in an SVM classifier. In addition to features used in prior studies, we introduce topic related features and demonstrate that these features are helpful for meeting summarization. We also propose a new way to resample the sentences based on their salience scores for model training and testing. The experimental results on both the human transcripts and recognition output, evaluated by the ROUGE summarization metrics, show that feature selection and data resampling help improve the system performance.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125567818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Sub-word modeling of out of vocabulary words in spoken term detection 口语词汇检测中非词汇的子词建模
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777893
Igor Szöke, L. Burget, J. Černocký, M. Fapšo
This paper deals with comparison of sub-word based methods for spoken term detection (STD) task and phone recognition. The sub-word units are needed for search for out-of-vocabulary words. We compared words, phones and multigrams. The maximal length and pruning of multigrams were investigated first. Then two constrained methods of multigram training were proposed. We evaluated on the NIST STD06 dev-set CTS data. The conclusion is that the proposed method improves the phone accuracy more than 9% relative and STD accuracy more than 7% relative.
本文对基于子词的语音词检测方法和基于子词的语音词识别方法进行了比较。子词单位用于搜索超出词汇表的单词。我们比较了单词、电话和复合图。首先研究了复合图的最大长度和剪枝问题。然后提出了两种约束的多图训练方法。我们在NIST STD06开发集CTS数据上进行了评估。结果表明,该方法可使电话精度相对提高9%以上,STD精度相对提高7%以上。
{"title":"Sub-word modeling of out of vocabulary words in spoken term detection","authors":"Igor Szöke, L. Burget, J. Černocký, M. Fapšo","doi":"10.1109/SLT.2008.4777893","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777893","url":null,"abstract":"This paper deals with comparison of sub-word based methods for spoken term detection (STD) task and phone recognition. The sub-word units are needed for search for out-of-vocabulary words. We compared words, phones and multigrams. The maximal length and pruning of multigrams were investigated first. Then two constrained methods of multigram training were proposed. We evaluated on the NIST STD06 dev-set CTS data. The conclusion is that the proposed method improves the phone accuracy more than 9% relative and STD accuracy more than 7% relative.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"171 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122946456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 73
Unexplored directions in spoken language technology for development 口语技术发展的未探索方向
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777825
F. Weber, Kalika Bali, R. Rosenfeld, K. Toyama
The full range of possibilities for spoken-language technologies (SLTs) to impact poor communities has been investigated on partially, despite what appears to be strong potential. Voice interfaces raise fewer barriers for the illiterate, require less training to use, and are a natural choice for applications on cell phones, which have far greater penetration, in the developing world than PCs. At the same time, critical lessons of existing technology projects in development still apply and require careful attention. We suggest how to expand the view of SLT for development, and discuss how its potential can realistically be explored.
口语技术对贫困社区产生影响的各种可能性已经进行了部分调查,尽管它似乎具有强大的潜力。语音界面给文盲带来的障碍更少,需要的使用培训也更少,而且是手机应用程序的自然选择,在发展中国家,手机的普及率远高于个人电脑。与此同时,发展中的现有技术项目的重要经验教训仍然适用,需要认真注意。我们建议如何拓展SLT的发展视野,并讨论如何切实挖掘其潜力。
{"title":"Unexplored directions in spoken language technology for development","authors":"F. Weber, Kalika Bali, R. Rosenfeld, K. Toyama","doi":"10.1109/SLT.2008.4777825","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777825","url":null,"abstract":"The full range of possibilities for spoken-language technologies (SLTs) to impact poor communities has been investigated on partially, despite what appears to be strong potential. Voice interfaces raise fewer barriers for the illiterate, require less training to use, and are a natural choice for applications on cell phones, which have far greater penetration, in the developing world than PCs. At the same time, critical lessons of existing technology projects in development still apply and require careful attention. We suggest how to expand the view of SLT for development, and discuss how its potential can realistically be explored.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128249959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
An analysis of grammatical errors in non-native speech in english 英语非母语语中的语法错误分析
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777847
J. Lee, S. Seneff
While a wide variety of grammatical mistakes may be observed in the speech of non-native speakers, the types and frequencies of these mistakes are not random. Certain parts of speech, for example, have been shown to be especially problematic for Japanese learners of English [1]. Modeling these errors can potentially enhance the performance of computer-assisted language learning systems. This paper presents an automatic method to estimate an error model from a non-native English corpus, focusing on articles and prepositions. A fine-grained analysis is achieved by conditioning the errors on appropriate words in the context.
虽然在非母语人士的讲话中可以观察到各种各样的语法错误,但这些错误的类型和频率并不是随机的。例如,对于学英语的日本人来说,某些词性特别成问题[1]。对这些错误进行建模可以潜在地提高计算机辅助语言学习系统的性能。本文提出了一种基于冠词和介词的非母语英语语料库错误模型自动估计方法。细粒度的分析是通过在上下文中适当的单词上设置错误来实现的。
{"title":"An analysis of grammatical errors in non-native speech in english","authors":"J. Lee, S. Seneff","doi":"10.1109/SLT.2008.4777847","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777847","url":null,"abstract":"While a wide variety of grammatical mistakes may be observed in the speech of non-native speakers, the types and frequencies of these mistakes are not random. Certain parts of speech, for example, have been shown to be especially problematic for Japanese learners of English [1]. Modeling these errors can potentially enhance the performance of computer-assisted language learning systems. This paper presents an automatic method to estimate an error model from a non-native English corpus, focusing on articles and prepositions. A fine-grained analysis is achieved by conditioning the errors on appropriate words in the context.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128634944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Phonetic name matching for cross-lingual Spoken Sentence Retrieval 跨语言口语句子检索的语音名称匹配
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777895
Heng Ji, R. Grishman, Wen Wang
Cross-lingual spoken sentence retrieval (CLSSR) remains a challenge, especially for queries including OOV words such as person names. This paper proposes a simple method of fuzzy matching between query names and phones of candidate audio segments. This approach has the advantage of avoiding some word decoding errors in automatic speech recognition (ASR). Experiments on Mandarin-English CLSSR show that phone-based searching and conventional translation-based searching are complementary. Adding phone matching achieved 26.29% improvement on F-measure over searching on state-of-the-art machine translation (MT) output and 8.83% over entity translation (ET) output.
跨语言口语句子检索(CLSSR)仍然是一个挑战,特别是对于包含OOV词(如人名)的查询。本文提出了一种简单的候选音频片段查询名称与电话的模糊匹配方法。这种方法的优点是避免了自动语音识别(ASR)中的一些字解码错误。中英CLSSR实验表明,基于手机的搜索与传统的基于翻译的搜索是互补的。在最先进的机器翻译(MT)输出和实体翻译(ET)输出上,添加电话匹配的F-measure比搜索提高了26.29%和8.83%。
{"title":"Phonetic name matching for cross-lingual Spoken Sentence Retrieval","authors":"Heng Ji, R. Grishman, Wen Wang","doi":"10.1109/SLT.2008.4777895","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777895","url":null,"abstract":"Cross-lingual spoken sentence retrieval (CLSSR) remains a challenge, especially for queries including OOV words such as person names. This paper proposes a simple method of fuzzy matching between query names and phones of candidate audio segments. This approach has the advantage of avoiding some word decoding errors in automatic speech recognition (ASR). Experiments on Mandarin-English CLSSR show that phone-based searching and conventional translation-based searching are complementary. Adding phone matching achieved 26.29% improvement on F-measure over searching on state-of-the-art machine translation (MT) output and 8.83% over entity translation (ET) output.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127751950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Bob: A lexicon and pronunciation dictionary generator 鲍勃:一个词典和发音词典生成器
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777879
V. Wan, J. Dines, A. Hannani, Thomas Hain
This paper presents Bob, a tool for managing lexicons and generating pronunciation dictionaries for automatic speech recognition systems. It aims to maintain a high level of consistency between lexicons and language modelling corpora by managing the text normalisation and lexicon generation processes in a single dedicated package. It also aims to maintain consistent pronunciation dictionaries by generating pronunciation hypotheses automatically and aiding their verification. The tool's design and functionality are described. Also two case studies highlighting the importance of consistency and illustrating the use of the tool are reported.
本文介绍了一个用于自动语音识别系统的词汇管理和语音字典生成工具Bob。它旨在通过在单个专用包中管理文本规范化和词汇生成过程来保持词汇和语言建模语料库之间的高度一致性。它还旨在通过自动生成发音假设并帮助其验证来保持一致的发音字典。介绍了该工具的设计和功能。此外,还报告了两个案例研究,突出了一致性的重要性并说明了该工具的使用。
{"title":"Bob: A lexicon and pronunciation dictionary generator","authors":"V. Wan, J. Dines, A. Hannani, Thomas Hain","doi":"10.1109/SLT.2008.4777879","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777879","url":null,"abstract":"This paper presents Bob, a tool for managing lexicons and generating pronunciation dictionaries for automatic speech recognition systems. It aims to maintain a high level of consistency between lexicons and language modelling corpora by managing the text normalisation and lexicon generation processes in a single dedicated package. It also aims to maintain consistent pronunciation dictionaries by generating pronunciation hypotheses automatically and aiding their verification. The tool's design and functionality are described. Also two case studies highlighting the importance of consistency and illustrating the use of the tool are reported.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121680972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Automatic identification of gender & accent in spoken Hindi utterances with regional Indian accents 自动识别性别和口音的印地语口语话语与区域印度口音
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777902
Kamini Malhotra, A. Khosla
In the past significant effort has been focused on automatic extraction of information from speech signals. Most techniques have aimed at automatic speech recognition or speaker identification. Automatic accent identification (AID) has received far less attention. This paper gives an approach to identify gender and accent of a speaker using Gaussian mixture modeling technique. The proposed approach is text independent and identifies accent among four regional Indian accents in spoken Hindi and also identifies the gender. The accents worked upon are Kashmiri, Manipuri, Bengali and neutral Hindi. The Gaussian mixture model (GMM) approach precludes the need of speech segmentation for training and makes the implementation of the system very simple. When gender dependent GMMs are used, the accent identification score is enhanced and gender is also correctly recognized. The results show that the GMMs lend themselves to accent and gender identification task very well. In this approach spectral features have been incorporated in the form of mel frequency cepstral coefficients (MFCC). The approach has a wide scope of expansion to incorporate other regional accents in a very simple way.
过去,人们一直致力于从语音信号中自动提取信息。大多数技术的目标是自动语音识别或说话人识别。自动口音识别(AID)的研究很少受到关注。本文提出了一种利用高斯混合建模技术识别说话人性别和口音的方法。所提出的方法是文本独立的,可以识别印地语口语中四个地区印度口音中的口音,也可以识别性别。这些口音包括克什米尔语、曼尼普尔语、孟加拉语和中立的印地语。高斯混合模型(GMM)方法排除了训练中语音分割的需要,使得系统的实现非常简单。当使用性别依赖gmm时,口音识别得分提高,性别识别也正确。结果表明,gmm在口音和性别识别任务中表现良好。在这种方法中,频谱特征以mel频率倒谱系数(MFCC)的形式被纳入。这种方法有很大的扩展空间,可以用一种非常简单的方式融入其他地区的口音。
{"title":"Automatic identification of gender & accent in spoken Hindi utterances with regional Indian accents","authors":"Kamini Malhotra, A. Khosla","doi":"10.1109/SLT.2008.4777902","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777902","url":null,"abstract":"In the past significant effort has been focused on automatic extraction of information from speech signals. Most techniques have aimed at automatic speech recognition or speaker identification. Automatic accent identification (AID) has received far less attention. This paper gives an approach to identify gender and accent of a speaker using Gaussian mixture modeling technique. The proposed approach is text independent and identifies accent among four regional Indian accents in spoken Hindi and also identifies the gender. The accents worked upon are Kashmiri, Manipuri, Bengali and neutral Hindi. The Gaussian mixture model (GMM) approach precludes the need of speech segmentation for training and makes the implementation of the system very simple. When gender dependent GMMs are used, the accent identification score is enhanced and gender is also correctly recognized. The results show that the GMMs lend themselves to accent and gender identification task very well. In this approach spectral features have been incorporated in the form of mel frequency cepstral coefficients (MFCC). The approach has a wide scope of expansion to incorporate other regional accents in a very simple way.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129375953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Sequential system combination for machine translation of speech 顺序系统组合用于语音的机器翻译
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777889
D. Karakos, S. Khudanpur
System combination is a technique which has been shown to yield significant gains in speech recognition and machine translation. Most combination schemes perform an alignment between different system outputs in order to produce lattices (or confusion networks), from which a composite hypothesis is chosen, possibly with the help of a large language model. The benefit of this approach is two-fold: (i) whenever many systems agree with each other on a set of words, the combination output contains these words with high confidence; and (ii) whenever the systems disagree, the language model resolves the ambiguity based on the (probably correct) agreed upon context. The case of machine translation system combination is more challenging because of the different word orders of the translations: the alignment has to incorporate computationally expensive movements of word blocks. In this paper, we show how one can combine translation outputs efficiently, extending the incremental alignment procedure of (A-V.I. Rosti et al., 2008). A comparison between different system combination design choices is performed on an Arabic speech translation task.
系统组合是一种在语音识别和机器翻译方面取得显著成果的技术。大多数组合方案在不同的系统输出之间执行对齐,以产生格(或混淆网络),从中选择复合假设,可能借助大型语言模型。这种方法的好处是双重的:(i)当许多系统在一组单词上彼此一致时,组合输出以高置信度包含这些单词;(ii)当系统不一致时,语言模型基于(可能正确的)商定的上下文来解决歧义。机器翻译系统组合的情况更具挑战性,因为翻译的词序不同:对齐必须包含计算上昂贵的词块移动。在本文中,我们展示了如何有效地组合翻译输出,扩展了(a - v - i)的增量对齐过程。Rosti et al., 2008)。针对一个阿拉伯语语音翻译任务,对不同的系统组合设计选择进行了比较。
{"title":"Sequential system combination for machine translation of speech","authors":"D. Karakos, S. Khudanpur","doi":"10.1109/SLT.2008.4777889","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777889","url":null,"abstract":"System combination is a technique which has been shown to yield significant gains in speech recognition and machine translation. Most combination schemes perform an alignment between different system outputs in order to produce lattices (or confusion networks), from which a composite hypothesis is chosen, possibly with the help of a large language model. The benefit of this approach is two-fold: (i) whenever many systems agree with each other on a set of words, the combination output contains these words with high confidence; and (ii) whenever the systems disagree, the language model resolves the ambiguity based on the (probably correct) agreed upon context. The case of machine translation system combination is more challenging because of the different word orders of the translations: the alignment has to incorporate computationally expensive movements of word blocks. In this paper, we show how one can combine translation outputs efficiently, extending the incremental alignment procedure of (A-V.I. Rosti et al., 2008). A comparison between different system combination design choices is performed on an Arabic speech translation task.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132771317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2008 IEEE Spoken Language Technology Workshop
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1