{"title":"Exploring an unsupervised, language independent, spoken document retrieval system","authors":"Alexandru Caranica, H. Cucu, Andi Buzo","doi":"10.1109/CBMI.2016.7500262","DOIUrl":null,"url":null,"abstract":"With the increasing availability of spoken documents in different languages, there is a need of systems performing automatic and unsupervised search on audio streams, containing speech, in a document retrieval scenario. We are interested in retrieving information from multilingual speech data, from spoken documents such as broadcast news, video archives or even telephone conversations. The ultimate goal of a Spoken Document Retrieval System is to enable vocabulary-independent search over large collections of speech content, to find written or spoken “queries” or reoccurring speech data. If the language is known, the task is relatively simple. One could use a large vocabulary continuous speech recognition (LVCSR) tool to produce highly accurate word transcripts, which are then indexed and query terms are retrieved from the index. However, if the language is unknown, hence queries are not part of the recognizers vocabulary, the relevant audio documents cannot be retrieved. Thus, search metrics are affected, and documents retrieved are no longer relevant to the user. In this paper we investigate whether the use of input features derived from multi-language resources helps the process of unsupervised spoken term detection, independent of the language. Moreover, we explore the use of multi objective search, by combining both language detection and LVCSR based search, with unsupervised Spoken Term Detection (STD). In order to achieve this, we make use of multiple open-source tools and in-house acoustic and language models, to propose a language independent spoken document retrieval system.","PeriodicalId":356608,"journal":{"name":"2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBMI.2016.7500262","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
With the increasing availability of spoken documents in different languages, there is a need of systems performing automatic and unsupervised search on audio streams, containing speech, in a document retrieval scenario. We are interested in retrieving information from multilingual speech data, from spoken documents such as broadcast news, video archives or even telephone conversations. The ultimate goal of a Spoken Document Retrieval System is to enable vocabulary-independent search over large collections of speech content, to find written or spoken “queries” or reoccurring speech data. If the language is known, the task is relatively simple. One could use a large vocabulary continuous speech recognition (LVCSR) tool to produce highly accurate word transcripts, which are then indexed and query terms are retrieved from the index. However, if the language is unknown, hence queries are not part of the recognizers vocabulary, the relevant audio documents cannot be retrieved. Thus, search metrics are affected, and documents retrieved are no longer relevant to the user. In this paper we investigate whether the use of input features derived from multi-language resources helps the process of unsupervised spoken term detection, independent of the language. Moreover, we explore the use of multi objective search, by combining both language detection and LVCSR based search, with unsupervised Spoken Term Detection (STD). In order to achieve this, we make use of multiple open-source tools and in-house acoustic and language models, to propose a language independent spoken document retrieval system.