Using proxies for OOV keywords in the keyword search task

2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI:10.1109/ASRU.2013.6707766

Guoguo Chen, Oguz Yilmaz, J. Trmal, Daniel Povey, S. Khudanpur

{"title":"Using proxies for OOV keywords in the keyword search task","authors":"Guoguo Chen, Oguz Yilmaz, J. Trmal, Daniel Povey, S. Khudanpur","doi":"10.1109/ASRU.2013.6707766","DOIUrl":null,"url":null,"abstract":"We propose a simple but effective weighted finite state transducer (WFST) based framework for handling out-of-vocabulary (OOV) keywords in a speech search task. State-of-the-art large vocabulary continuous speech recognition (LVCSR) and keyword search (KWS) systems are developed for conversational telephone speech in Tagalog. Word-based and phone-based indexes are created from word lattices, the latter by using the LVCSR system's pronunciation lexicon. Pronunciations of OOV keywords are hypothesized via a standard grapheme-to-phoneme method. In-vocabulary proxies (word or phone sequences) are generated for each OOV keyword using WFST techniques that permit incorporation of a phone confusion matrix. Empirical results when searching for the Babel/NIST evaluation keywords in the Babel 10 hour development-test speech collection show that (i) searching for word proxies in the word index significantly outperforms searching for phonetic representations of OOV words in a phone index, and (ii) while phone confusion information yields minor improvement when searching a phone index, it yields up to 40% improvement in actual term weighted value when searching a word index with word proxies.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"100","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2013.6707766","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 100

Abstract

We propose a simple but effective weighted finite state transducer (WFST) based framework for handling out-of-vocabulary (OOV) keywords in a speech search task. State-of-the-art large vocabulary continuous speech recognition (LVCSR) and keyword search (KWS) systems are developed for conversational telephone speech in Tagalog. Word-based and phone-based indexes are created from word lattices, the latter by using the LVCSR system's pronunciation lexicon. Pronunciations of OOV keywords are hypothesized via a standard grapheme-to-phoneme method. In-vocabulary proxies (word or phone sequences) are generated for each OOV keyword using WFST techniques that permit incorporation of a phone confusion matrix. Empirical results when searching for the Babel/NIST evaluation keywords in the Babel 10 hour development-test speech collection show that (i) searching for word proxies in the word index significantly outperforms searching for phonetic representations of OOV words in a phone index, and (ii) while phone confusion information yields minor improvement when searching a phone index, it yields up to 40% improvement in actual term weighted value when searching a word index with word proxies.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在关键字搜索任务中为OOV关键字使用代理

我们提出了一个简单而有效的基于加权有限状态换能器(WFST)的框架来处理语音搜索任务中的词汇外(OOV)关键字。针对他加禄语的电话会话语音，开发了最新的大词汇连续语音识别(LVCSR)和关键字搜索(KWS)系统。基于单词和基于电话的索引是从单词格中创建的，后者使用LVCSR系统的发音词典。OOV关键词的发音是通过标准的字素到音素的方法来假设的。使用允许合并电话混淆矩阵的WFST技术为每个OOV关键字生成词汇表内代理(单词或电话序列)。在Babel 10小时发展测试语音集合中搜索Babel/NIST评价关键词的实证结果表明:(i)在单词索引中搜索单词代理显著优于在电话索引中搜索OOV单词的语音表示;(ii)在搜索电话索引时，虽然电话混淆信息的改进很小，但在使用单词代理搜索单词索引时，实际术语加权值的改进高达40%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2013 IEEE Workshop on Automatic Speech Recognition and Understanding

自引率

0.00%

发文量