I-vector based language modeling for spoken document retrieval

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2014-05-04 DOI:10.1109/ICASSP.2014.6854974

Kuan-Yu Chen, Hung-Shin Lee, H. Wang, Berlin Chen, Hsin-Hsi Chen

{"title":"I-vector based language modeling for spoken document retrieval","authors":"Kuan-Yu Chen, Hung-Shin Lee, H. Wang, Berlin Chen, Hsin-Hsi Chen","doi":"10.1109/ICASSP.2014.6854974","DOIUrl":null,"url":null,"abstract":"Since more and more multimedia data associated with spoken documents have been made available to the public, spoken document retrieval (SDR) has become an important research subject in the past two decades. The i-vector based framework has been proposed and introduced to language identification (LID) and speaker recognition (SR) tasks recently. The major contribution of the i-vector framework is to reduce a series of acoustic feature vectors of a speech utterance to a low-dimensional vector representation, and then numbers of well-developed postprocessing techniques (such as probabilistic linear discriminative analysis, PLDA) can be readily and effectively used. However, to our best knowledge, there is no research up to date on applying the i-vector framework for SDR or information retrieval (IR). In this paper, we make a step forward to formulate an i-vector based language modeling (IVLM) framework for SDR. Furthermore, we evaluate the proposed IVLM framework with both inductive and transductive learning strategies. We also exploit multi-levels of index features, including word- and subword-level units, in concert with the proposed framework. The results of SDR experiments conducted on the TDT-2 (Topic Detection and Tracking) collection demonstrate the performance merits of our proposed framework when compared to several existing approaches.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"115 1","pages":"7083-7088"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2014.6854974","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

Since more and more multimedia data associated with spoken documents have been made available to the public, spoken document retrieval (SDR) has become an important research subject in the past two decades. The i-vector based framework has been proposed and introduced to language identification (LID) and speaker recognition (SR) tasks recently. The major contribution of the i-vector framework is to reduce a series of acoustic feature vectors of a speech utterance to a low-dimensional vector representation, and then numbers of well-developed postprocessing techniques (such as probabilistic linear discriminative analysis, PLDA) can be readily and effectively used. However, to our best knowledge, there is no research up to date on applying the i-vector framework for SDR or information retrieval (IR). In this paper, we make a step forward to formulate an i-vector based language modeling (IVLM) framework for SDR. Furthermore, we evaluate the proposed IVLM framework with both inductive and transductive learning strategies. We also exploit multi-levels of index features, including word- and subword-level units, in concert with the proposed framework. The results of SDR experiments conducted on the TDT-2 (Topic Detection and Tracking) collection demonstrate the performance merits of our proposed framework when compared to several existing approaches.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于i向量的语音文档检索语言建模

随着越来越多与口语文件相关的多媒体数据向公众开放，口语文件检索(SDR)在近二十年来成为一个重要的研究课题。近年来，基于i向量的框架被提出并引入到语言识别和说话人识别任务中。i-vector框架的主要贡献是将语音的一系列声学特征向量简化为低维向量表示，然后可以方便有效地使用许多发达的后处理技术(如概率线性判别分析，PLDA)。然而，据我们所知，目前还没有关于将i向量框架应用于SDR或信息检索(IR)的研究。本文进一步提出了一种基于i向量的SDR语言建模(IVLM)框架。此外，我们用归纳和转导学习策略评估了所提出的IVLM框架。我们还利用了多层索引特征，包括词级和子词级单位，以配合所提出的框架。在TDT-2(主题检测和跟踪)集上进行的SDR实验结果表明，与几种现有方法相比，我们提出的框架具有性能优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量