Exploring an unsupervised, language independent, spoken document retrieval system

Alexandru Caranica, H. Cucu, Andi Buzo
{"title":"Exploring an unsupervised, language independent, spoken document retrieval system","authors":"Alexandru Caranica, H. Cucu, Andi Buzo","doi":"10.1109/CBMI.2016.7500262","DOIUrl":null,"url":null,"abstract":"With the increasing availability of spoken documents in different languages, there is a need of systems performing automatic and unsupervised search on audio streams, containing speech, in a document retrieval scenario. We are interested in retrieving information from multilingual speech data, from spoken documents such as broadcast news, video archives or even telephone conversations. The ultimate goal of a Spoken Document Retrieval System is to enable vocabulary-independent search over large collections of speech content, to find written or spoken “queries” or reoccurring speech data. If the language is known, the task is relatively simple. One could use a large vocabulary continuous speech recognition (LVCSR) tool to produce highly accurate word transcripts, which are then indexed and query terms are retrieved from the index. However, if the language is unknown, hence queries are not part of the recognizers vocabulary, the relevant audio documents cannot be retrieved. Thus, search metrics are affected, and documents retrieved are no longer relevant to the user. In this paper we investigate whether the use of input features derived from multi-language resources helps the process of unsupervised spoken term detection, independent of the language. Moreover, we explore the use of multi objective search, by combining both language detection and LVCSR based search, with unsupervised Spoken Term Detection (STD). In order to achieve this, we make use of multiple open-source tools and in-house acoustic and language models, to propose a language independent spoken document retrieval system.","PeriodicalId":356608,"journal":{"name":"2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBMI.2016.7500262","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

With the increasing availability of spoken documents in different languages, there is a need of systems performing automatic and unsupervised search on audio streams, containing speech, in a document retrieval scenario. We are interested in retrieving information from multilingual speech data, from spoken documents such as broadcast news, video archives or even telephone conversations. The ultimate goal of a Spoken Document Retrieval System is to enable vocabulary-independent search over large collections of speech content, to find written or spoken “queries” or reoccurring speech data. If the language is known, the task is relatively simple. One could use a large vocabulary continuous speech recognition (LVCSR) tool to produce highly accurate word transcripts, which are then indexed and query terms are retrieved from the index. However, if the language is unknown, hence queries are not part of the recognizers vocabulary, the relevant audio documents cannot be retrieved. Thus, search metrics are affected, and documents retrieved are no longer relevant to the user. In this paper we investigate whether the use of input features derived from multi-language resources helps the process of unsupervised spoken term detection, independent of the language. Moreover, we explore the use of multi objective search, by combining both language detection and LVCSR based search, with unsupervised Spoken Term Detection (STD). In order to achieve this, we make use of multiple open-source tools and in-house acoustic and language models, to propose a language independent spoken document retrieval system.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
探索无监督、语言独立、口语文档检索系统
随着越来越多的不同语言的口语文档的可用性,在文档检索场景中,需要系统对包含语音的音频流执行自动和无监督搜索。我们感兴趣的是从多语言语音数据中检索信息,从广播新闻、视频档案甚至电话对话等口语文件中检索信息。口语文档检索系统的最终目标是在大量的语音内容集合上实现与词汇无关的搜索,找到书面或口头的“查询”或重复出现的语音数据。如果语言是已知的,任务就相对简单。可以使用大词汇量连续语音识别(LVCSR)工具生成高度准确的单词转录本,然后对其进行索引,并从索引中检索查询术语。但是,如果语言是未知的,因此查询不是识别器词汇表的一部分,则无法检索相关的音频文档。因此,搜索指标会受到影响,并且检索到的文档不再与用户相关。在本文中,我们研究了使用来自多语言资源的输入特征是否有助于独立于语言的无监督口语术语检测过程。此外,我们通过将语言检测和基于LVCSR的搜索与无监督口语术语检测(STD)相结合,探索了多目标搜索的使用。为了实现这一目标,我们利用多个开源工具和内部声学和语言模型,提出了一个独立于语言的口语文档检索系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Music Tweet Map: A browsing interface to explore the microblogosphere of music A novel architecture of semantic web reasoner based on transferable belief model Simple tag-based subclass representations for visually-varied image classes Crowdsourcing as self-fulfilling prophecy: Influence of discarding workers in subjective assessment tasks EIR — Efficient computer aided diagnosis framework for gastrointestinal endoscopies
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1