首页 > 最新文献

2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)最新文献

英文 中文
Investigating segment-based query expansion for user-generated spoken content retrieval 为用户生成的口语内容检索研究基于段的查询扩展
Pub Date : 2016-06-15 DOI: 10.1109/CBMI.2016.7500268
Ahmad Khwileh, G. Jones
The very rapid growth in user-generated social multimedia content on online platforms is creating new challenges for search technologies. A significant issue for search of this type of content is its highly variable form and quality. This is compounded by the standard information retrieval (IR) problem of mismatch between search queries and target items. Query Expansion (QE) has been shown to be an effect technique to improve IR effectiveness for multiple search tasks. In QE, words from a number of relevant or assumed relevant top ranked documents from an initial search are added to the initial search query to enrich it before carrying out a further search operation. In this work, we investigate the application of QE methods for searching social multimedia content. In particular we focus on social multimedia content where the information is primarily in the audio stream. To address the challenge of content variability, we introduce three speech segment-based methods for QE using: Semantic segmentation, Discourse segmentation and Window-Based. Our experimental investigation illustrates the superiority of these segment-based methods in comparison to a standard full document QE method for a version of the MediaEval 2012 Search task newly extended as an adhoc search task.
在线平台上用户生成的社交多媒体内容的快速增长给搜索技术带来了新的挑战。搜索这类内容的一个重要问题是其高度可变的形式和质量。搜索查询和目标项之间不匹配的标准信息检索(IR)问题使问题更加复杂。查询扩展(Query Expansion, QE)已被证明是一种有效的技术,可以提高多搜索任务的IR效率。在QE中,在执行进一步的搜索操作之前,将来自初始搜索的一些相关或假设相关的排名靠前的文档中的单词添加到初始搜索查询中以丰富它。在这项工作中,我们研究了QE方法在搜索社交多媒体内容中的应用。我们特别关注社交多媒体内容,其中信息主要在音频流中。为了解决内容可变性的挑战,我们引入了三种基于语音片段的QE方法:语义分割、话语分割和基于窗口的方法。我们的实验研究表明,与标准的完整文档QE方法相比,这些基于片段的方法在MediaEval 2012搜索任务版本中具有优势,该搜索任务新扩展为一个临时搜索任务。
{"title":"Investigating segment-based query expansion for user-generated spoken content retrieval","authors":"Ahmad Khwileh, G. Jones","doi":"10.1109/CBMI.2016.7500268","DOIUrl":"https://doi.org/10.1109/CBMI.2016.7500268","url":null,"abstract":"The very rapid growth in user-generated social multimedia content on online platforms is creating new challenges for search technologies. A significant issue for search of this type of content is its highly variable form and quality. This is compounded by the standard information retrieval (IR) problem of mismatch between search queries and target items. Query Expansion (QE) has been shown to be an effect technique to improve IR effectiveness for multiple search tasks. In QE, words from a number of relevant or assumed relevant top ranked documents from an initial search are added to the initial search query to enrich it before carrying out a further search operation. In this work, we investigate the application of QE methods for searching social multimedia content. In particular we focus on social multimedia content where the information is primarily in the audio stream. To address the challenge of content variability, we introduce three speech segment-based methods for QE using: Semantic segmentation, Discourse segmentation and Window-Based. Our experimental investigation illustrates the superiority of these segment-based methods in comparison to a standard full document QE method for a version of the MediaEval 2012 Search task newly extended as an adhoc search task.","PeriodicalId":356608,"journal":{"name":"2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123233086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Indexing multimedia learning materials in ultimate course search 在终极课程检索中索引多媒体学习资料
Pub Date : 2016-06-15 DOI: 10.1109/CBMI.2016.7500250
Sheetal Rajgure, Krithika Raghavan, Vincent Oria, Reza Curtmola, Edina Renfro-Michel, P. Gouton
Multimedia is the main support for online learning materials and the size of multimedia learning materials is growing with the popularity of online programs offered by Universities. Ultimate Course Search (UCS) is a tool that aims to provide efficient search of course materials. UCS integrates slides, lecture videos and textbook content into a single platform with search capabilities. The keywords extracted from the textbook index and the power-point slides are the basis of the indexing scheme. The slides are indexed on the keywords and the videos are indexed on the slides. The correspondence between the slides and video segments is established using the meta-data provided by the video recording software when available and by image processing techniques. Unlike a classical document search in which the user would be looking where the keywords are found, the search of learning materials in UCS is different because the user is also looking where the search words are better explained. We propose a keyword appearance prioritized ranking mechanism that integrates into the ranking, the location information of the keyword from the slides.
多媒体是在线学习材料的主要支撑,随着高校在线课程的普及,多媒体学习材料的规模也在不断扩大。终极课程搜索(UCS)是一个旨在提供高效课程资料搜索的工具。UCS将幻灯片、讲座视频和教科书内容集成到一个具有搜索功能的单一平台中。从教科书索引和幻灯片中提取的关键词是索引方案的基础。幻灯片索引的关键字和视频索引的幻灯片。幻灯片和视频片段之间的对应关系是通过视频录制软件提供的元数据和图像处理技术建立的。在经典的文档搜索中,用户会查找关键字在哪里,而在UCS中搜索学习材料则不同,因为用户还会查找搜索词在哪里得到更好的解释。我们提出了一种关键词外观优先排序机制,该机制将幻灯片中关键词的位置信息整合到排序中。
{"title":"Indexing multimedia learning materials in ultimate course search","authors":"Sheetal Rajgure, Krithika Raghavan, Vincent Oria, Reza Curtmola, Edina Renfro-Michel, P. Gouton","doi":"10.1109/CBMI.2016.7500250","DOIUrl":"https://doi.org/10.1109/CBMI.2016.7500250","url":null,"abstract":"Multimedia is the main support for online learning materials and the size of multimedia learning materials is growing with the popularity of online programs offered by Universities. Ultimate Course Search (UCS) is a tool that aims to provide efficient search of course materials. UCS integrates slides, lecture videos and textbook content into a single platform with search capabilities. The keywords extracted from the textbook index and the power-point slides are the basis of the indexing scheme. The slides are indexed on the keywords and the videos are indexed on the slides. The correspondence between the slides and video segments is established using the meta-data provided by the video recording software when available and by image processing techniques. Unlike a classical document search in which the user would be looking where the keywords are found, the search of learning materials in UCS is different because the user is also looking where the search words are better explained. We propose a keyword appearance prioritized ranking mechanism that integrates into the ranking, the location information of the keyword from the slides.","PeriodicalId":356608,"journal":{"name":"2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128339689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Model-based video content representation 基于模型的视频内容表示
Pub Date : 2016-06-15 DOI: 10.1109/CBMI.2016.7500254
Lukas Diem, M. Zaharieva
Recurring visual elements in videos commonly represent central content entities, such as main characters and dominant objects. The automated detection of such elements is crucial for various application fields ranging from compact video content summarization to the retrieval of videos sharing common visual entities. Recent approaches for content-based video analysis commonly require for prior knowledge about the appearance of potential objects of interest or build upon a specific assumption, such as the presence of a particular camera view, object motion, or a reference set to estimate the appearance of an object. In this paper, we propose an unsupervised, model-based approach for the detection of recurring visual elements in a video sequence. Detected elements do not necessarily represent an object, yet, they allow for visual and semantic interpretation. The experimental evaluation of detected models across different videos demonstrate the ability of the models to capture potentially high diversity in the visual appearance of the traced elements.
视频中重复出现的视觉元素通常代表中心内容实体,如主要人物和主要对象。这些元素的自动检测对于从紧凑的视频内容摘要到共享共同视觉实体的视频检索等各种应用领域至关重要。最近的基于内容的视频分析方法通常需要关于潜在感兴趣对象外观的先验知识,或者建立在特定假设的基础上,例如特定摄像机视图的存在、对象运动或用于估计对象外观的参考集。在本文中,我们提出了一种无监督的、基于模型的方法来检测视频序列中重复出现的视觉元素。检测到的元素不一定表示对象,但是它们允许视觉和语义解释。对不同视频中检测到的模型进行的实验评估表明,模型能够捕获跟踪元素视觉外观的潜在高度多样性。
{"title":"Model-based video content representation","authors":"Lukas Diem, M. Zaharieva","doi":"10.1109/CBMI.2016.7500254","DOIUrl":"https://doi.org/10.1109/CBMI.2016.7500254","url":null,"abstract":"Recurring visual elements in videos commonly represent central content entities, such as main characters and dominant objects. The automated detection of such elements is crucial for various application fields ranging from compact video content summarization to the retrieval of videos sharing common visual entities. Recent approaches for content-based video analysis commonly require for prior knowledge about the appearance of potential objects of interest or build upon a specific assumption, such as the presence of a particular camera view, object motion, or a reference set to estimate the appearance of an object. In this paper, we propose an unsupervised, model-based approach for the detection of recurring visual elements in a video sequence. Detected elements do not necessarily represent an object, yet, they allow for visual and semantic interpretation. The experimental evaluation of detected models across different videos demonstrate the ability of the models to capture potentially high diversity in the visual appearance of the traced elements.","PeriodicalId":356608,"journal":{"name":"2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128266924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Filterbank coefficients selection for segmentation in singer turns 歌手圈分割的滤波器组系数选择
Pub Date : 2016-06-15 DOI: 10.1109/CBMI.2016.7500273
Marwa Thlithi, J. Pinquier, Thomas Pellegrini, R. André-Obrecht
Audio segmentation is often the first step of audio indexing systems. It provides segments supposed to be acoustically homogeneous. In this paper, we report our recent experiments on segmenting music recordings into singer turns, by analogy with speaker turns in speech processing. We compare several acoustic features for this task: FilterBANK coefficients (FBANK), and Mel frequency cepstral coefficients (MFCC). FBANK features were shown to outperform MFCC on a “clean” singing corpus. We describe a coefficient selection method that allowed further improvement on this corpus. A 75.8% F-measure was obtained with FBANK features selected with this method, corresponding to a 30.6% absolute gain compared to MFCC. On another corpus comprised of ethno-musicological recordings, both feature types showed a similar performance of about 60%. This corpus presents an increased difficulty due to the presence of instruments overlapped with singing and to a lower recording audio quality.
音频分割通常是音频索引系统的第一步。它提供的片段在声学上是均匀的。在本文中,我们报告了我们最近的实验,将音乐录音分割成歌手回合,类比于语音处理中的说话者回合。我们比较了该任务的几个声学特征:FilterBANK系数(FBANK)和Mel频率倒谱系数(MFCC)。FBANK特征在“干净”歌唱语料库上的表现优于MFCC。我们描述了一种系数选择方法,允许进一步改进该语料库。使用该方法选择的FBANK特征获得了75.8%的f测量值,与MFCC相比,对应于30.6%的绝对增益。在另一个由民族音乐学录音组成的语料库中,两种特征类型的表现相似,约为60%。由于存在与唱歌重叠的乐器和较低的录音音频质量,该语料库呈现出增加的困难。
{"title":"Filterbank coefficients selection for segmentation in singer turns","authors":"Marwa Thlithi, J. Pinquier, Thomas Pellegrini, R. André-Obrecht","doi":"10.1109/CBMI.2016.7500273","DOIUrl":"https://doi.org/10.1109/CBMI.2016.7500273","url":null,"abstract":"Audio segmentation is often the first step of audio indexing systems. It provides segments supposed to be acoustically homogeneous. In this paper, we report our recent experiments on segmenting music recordings into singer turns, by analogy with speaker turns in speech processing. We compare several acoustic features for this task: FilterBANK coefficients (FBANK), and Mel frequency cepstral coefficients (MFCC). FBANK features were shown to outperform MFCC on a “clean” singing corpus. We describe a coefficient selection method that allowed further improvement on this corpus. A 75.8% F-measure was obtained with FBANK features selected with this method, corresponding to a 30.6% absolute gain compared to MFCC. On another corpus comprised of ethno-musicological recordings, both feature types showed a similar performance of about 60%. This corpus presents an increased difficulty due to the presence of instruments overlapped with singing and to a lower recording audio quality.","PeriodicalId":356608,"journal":{"name":"2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134277752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large scale content-based video retrieval with LIvRE 使用LIvRE进行大规模基于内容的视频检索
Pub Date : 2016-06-15 DOI: 10.1109/CBMI.2016.7500266
Gabriel de Oliveira Barra, M. Lux, Xavier Giró-i-Nieto
The fast growth of video data requires robust, efficient, and scalable systems to allow for indexing and retrieval. These systems must be accessible from lightweight, portable and usable interfaces to help users in management and search of video content. This demo paper presents LIvRE, an extension of an existing open source tool for image retrieval to support video indexing. LIvRE consists of three main system components (pre-processing, indexing and retrieval), as well as a scalable and responsive HTML5 user interface accessible from a web browser. LIvRE supports image-based queries, which are efficiently matched with the extracted frames of the indexed videos.
视频数据的快速增长需要健壮、高效和可扩展的系统来支持索引和检索。这些系统必须能够从轻量级、便携和可用的界面访问,以帮助用户管理和搜索视频内容。这篇演示论文介绍了LIvRE,它是一个现有的开放源代码图像检索工具的扩展,用于支持视频索引。LIvRE包括三个主要的系统组件(预处理,索引和检索),以及一个可扩展和响应的HTML5用户界面,可从web浏览器访问。LIvRE支持基于图像的查询,可以有效地匹配索引视频中提取的帧。
{"title":"Large scale content-based video retrieval with LIvRE","authors":"Gabriel de Oliveira Barra, M. Lux, Xavier Giró-i-Nieto","doi":"10.1109/CBMI.2016.7500266","DOIUrl":"https://doi.org/10.1109/CBMI.2016.7500266","url":null,"abstract":"The fast growth of video data requires robust, efficient, and scalable systems to allow for indexing and retrieval. These systems must be accessible from lightweight, portable and usable interfaces to help users in management and search of video content. This demo paper presents LIvRE, an extension of an existing open source tool for image retrieval to support video indexing. LIvRE consists of three main system components (pre-processing, indexing and retrieval), as well as a scalable and responsive HTML5 user interface accessible from a web browser. LIvRE supports image-based queries, which are efficiently matched with the extracted frames of the indexed videos.","PeriodicalId":356608,"journal":{"name":"2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122582674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
A hybrid graph-based and non-linear late fusion approach for multimedia retrieval 一种基于图形和非线性的多媒体检索后期融合方法
Pub Date : 2016-06-15 DOI: 10.1109/CBMI.2016.7500252
Ilias Gialampoukidis, A. Moumtzidou, Dimitris Liparas, S. Vrochidis, Y. Kompatsiaris
Nowadays, multimedia retrieval has become a task of high importance, due to the need for efficient and fast access to very large and heterogeneous multimedia collections. An interesting challenge within the aforementioned task is the efficient combination of different modalities in a multimedia object and especially the fusion between textual and visual information. The fusion of multiple modalities for retrieval in an unsupervised way has been mostly based on early, weighted linear, graph-based and diffusion-based techniques. In contrast, we present a strategy for fusing textual and visual modalities, through the combination of a non-linear fusion model and a graph-based late fusion approach. The fusion strategy is based on the construction of a uniform multimodal contextual similarity matrix and the non-linear combination of relevance scores from query-based similarity vectors. The proposed late fusion approach is evaluated in the multimedia retrieval task, by applying it to two multimedia collections, namely the WIKI11 and IAPR-TC12. The experimental results indicate its superiority over the baseline method in terms of Mean Average Precision for both considered datasets.
当前,由于需要高效、快速地访问大量异构的多媒体馆藏,多媒体检索已成为一项非常重要的任务。在上述任务中,一个有趣的挑战是在多媒体对象中有效地组合不同的模态,特别是文本和视觉信息之间的融合。以无监督的方式融合多种模式进行检索,主要基于早期的加权线性、基于图和基于扩散的技术。相比之下,我们提出了一种融合文本和视觉模式的策略,通过结合非线性融合模型和基于图形的后期融合方法。该融合策略基于统一的多模态上下文相似矩阵的构建和基于查询的相似向量的相关分数的非线性组合。通过对WIKI11和IAPR-TC12两个多媒体集合的应用,对所提出的后期融合方法在多媒体检索任务中的应用进行了评价。实验结果表明,该方法在两种考虑的数据集的平均精度方面优于基线方法。
{"title":"A hybrid graph-based and non-linear late fusion approach for multimedia retrieval","authors":"Ilias Gialampoukidis, A. Moumtzidou, Dimitris Liparas, S. Vrochidis, Y. Kompatsiaris","doi":"10.1109/CBMI.2016.7500252","DOIUrl":"https://doi.org/10.1109/CBMI.2016.7500252","url":null,"abstract":"Nowadays, multimedia retrieval has become a task of high importance, due to the need for efficient and fast access to very large and heterogeneous multimedia collections. An interesting challenge within the aforementioned task is the efficient combination of different modalities in a multimedia object and especially the fusion between textual and visual information. The fusion of multiple modalities for retrieval in an unsupervised way has been mostly based on early, weighted linear, graph-based and diffusion-based techniques. In contrast, we present a strategy for fusing textual and visual modalities, through the combination of a non-linear fusion model and a graph-based late fusion approach. The fusion strategy is based on the construction of a uniform multimodal contextual similarity matrix and the non-linear combination of relevance scores from query-based similarity vectors. The proposed late fusion approach is evaluated in the multimedia retrieval task, by applying it to two multimedia collections, namely the WIKI11 and IAPR-TC12. The experimental results indicate its superiority over the baseline method in terms of Mean Average Precision for both considered datasets.","PeriodicalId":356608,"journal":{"name":"2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128984766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Exploring an unsupervised, language independent, spoken document retrieval system 探索无监督、语言独立、口语文档检索系统
Pub Date : 2016-06-15 DOI: 10.1109/CBMI.2016.7500262
Alexandru Caranica, H. Cucu, Andi Buzo
With the increasing availability of spoken documents in different languages, there is a need of systems performing automatic and unsupervised search on audio streams, containing speech, in a document retrieval scenario. We are interested in retrieving information from multilingual speech data, from spoken documents such as broadcast news, video archives or even telephone conversations. The ultimate goal of a Spoken Document Retrieval System is to enable vocabulary-independent search over large collections of speech content, to find written or spoken “queries” or reoccurring speech data. If the language is known, the task is relatively simple. One could use a large vocabulary continuous speech recognition (LVCSR) tool to produce highly accurate word transcripts, which are then indexed and query terms are retrieved from the index. However, if the language is unknown, hence queries are not part of the recognizers vocabulary, the relevant audio documents cannot be retrieved. Thus, search metrics are affected, and documents retrieved are no longer relevant to the user. In this paper we investigate whether the use of input features derived from multi-language resources helps the process of unsupervised spoken term detection, independent of the language. Moreover, we explore the use of multi objective search, by combining both language detection and LVCSR based search, with unsupervised Spoken Term Detection (STD). In order to achieve this, we make use of multiple open-source tools and in-house acoustic and language models, to propose a language independent spoken document retrieval system.
随着越来越多的不同语言的口语文档的可用性,在文档检索场景中,需要系统对包含语音的音频流执行自动和无监督搜索。我们感兴趣的是从多语言语音数据中检索信息,从广播新闻、视频档案甚至电话对话等口语文件中检索信息。口语文档检索系统的最终目标是在大量的语音内容集合上实现与词汇无关的搜索,找到书面或口头的“查询”或重复出现的语音数据。如果语言是已知的,任务就相对简单。可以使用大词汇量连续语音识别(LVCSR)工具生成高度准确的单词转录本,然后对其进行索引,并从索引中检索查询术语。但是,如果语言是未知的,因此查询不是识别器词汇表的一部分,则无法检索相关的音频文档。因此,搜索指标会受到影响,并且检索到的文档不再与用户相关。在本文中,我们研究了使用来自多语言资源的输入特征是否有助于独立于语言的无监督口语术语检测过程。此外,我们通过将语言检测和基于LVCSR的搜索与无监督口语术语检测(STD)相结合,探索了多目标搜索的使用。为了实现这一目标,我们利用多个开源工具和内部声学和语言模型,提出了一个独立于语言的口语文档检索系统。
{"title":"Exploring an unsupervised, language independent, spoken document retrieval system","authors":"Alexandru Caranica, H. Cucu, Andi Buzo","doi":"10.1109/CBMI.2016.7500262","DOIUrl":"https://doi.org/10.1109/CBMI.2016.7500262","url":null,"abstract":"With the increasing availability of spoken documents in different languages, there is a need of systems performing automatic and unsupervised search on audio streams, containing speech, in a document retrieval scenario. We are interested in retrieving information from multilingual speech data, from spoken documents such as broadcast news, video archives or even telephone conversations. The ultimate goal of a Spoken Document Retrieval System is to enable vocabulary-independent search over large collections of speech content, to find written or spoken “queries” or reoccurring speech data. If the language is known, the task is relatively simple. One could use a large vocabulary continuous speech recognition (LVCSR) tool to produce highly accurate word transcripts, which are then indexed and query terms are retrieved from the index. However, if the language is unknown, hence queries are not part of the recognizers vocabulary, the relevant audio documents cannot be retrieved. Thus, search metrics are affected, and documents retrieved are no longer relevant to the user. In this paper we investigate whether the use of input features derived from multi-language resources helps the process of unsupervised spoken term detection, independent of the language. Moreover, we explore the use of multi objective search, by combining both language detection and LVCSR based search, with unsupervised Spoken Term Detection (STD). In order to achieve this, we make use of multiple open-source tools and in-house acoustic and language models, to propose a language independent spoken document retrieval system.","PeriodicalId":356608,"journal":{"name":"2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130413228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Indexing Ensembles of Exemplar-SVMs with rejecting taxonomies 具有拒绝分类法的范例支持向量机的索引集成
Pub Date : 2016-06-15 DOI: 10.1109/CBMI.2016.7500241
Federico Becattini, Lorenzo Seidenari, A. Bimbo
Ensembles of Exemplar-SVMs have been used for a wide variety of tasks, such as object detection, segmentation, label transfer and mid-level feature learning. In order to make this technique effective though a large collection of classifiers is needed, which often makes the evaluation phase prohibitive. To overcome this issue we exploit the joint distribution of exemplar classifier scores to build a taxonomy capable of indexing each Exemplar-SVM and enabling a fast evaluation of the whole ensemble. We experiment with the Pascal 2007 benchmark on the task of object detection and on a simple segmentation task, in order to verify the robustness of our indexing data structure with reference to the standard Ensemble. We also introduce a rejection strategy to discard not relevant image patches for a more efficient access to the data.
范例支持向量机的集合已被用于各种各样的任务,如对象检测、分割、标签转移和中级特征学习。为了使该技术有效,需要大量的分类器集合,这通常会使评估阶段变得令人望而却步。为了克服这个问题,我们利用样本分类器分数的联合分布来构建一个能够索引每个样本支持向量机的分类法,并能够快速评估整个集合。我们在Pascal 2007基准测试中对目标检测任务和简单分割任务进行了实验,以验证参考标准集成的索引数据结构的鲁棒性。我们还引入了一种拒绝策略来丢弃不相关的图像补丁,以便更有效地访问数据。
{"title":"Indexing Ensembles of Exemplar-SVMs with rejecting taxonomies","authors":"Federico Becattini, Lorenzo Seidenari, A. Bimbo","doi":"10.1109/CBMI.2016.7500241","DOIUrl":"https://doi.org/10.1109/CBMI.2016.7500241","url":null,"abstract":"Ensembles of Exemplar-SVMs have been used for a wide variety of tasks, such as object detection, segmentation, label transfer and mid-level feature learning. In order to make this technique effective though a large collection of classifiers is needed, which often makes the evaluation phase prohibitive. To overcome this issue we exploit the joint distribution of exemplar classifier scores to build a taxonomy capable of indexing each Exemplar-SVM and enabling a fast evaluation of the whole ensemble. We experiment with the Pascal 2007 benchmark on the task of object detection and on a simple segmentation task, in order to verify the robustness of our indexing data structure with reference to the standard Ensemble. We also introduce a rejection strategy to discard not relevant image patches for a more efficient access to the data.","PeriodicalId":356608,"journal":{"name":"2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130531947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deep learning vs spectral clustering into an active clustering with pairwise constraints propagation 深度学习与光谱聚类的对比,形成具有两两约束传播的主动聚类
Pub Date : 2016-06-15 DOI: 10.1109/CBMI.2016.7500237
Nicolas Voiron, A. Benoît, P. Lambert, B. Ionescu
In our data driven world, categorization is of major importance to help end-users and decision makers understanding information structures. Supervised learning techniques rely on annotated samples that are often difficult to obtain and training often overfits. On the other hand, unsupervised clustering techniques study the structure of the data without disposing of any training data. Given the difficulty of the task, supervised learning often outperforms unsupervised learning. A compromise is to use a partial knowledge, selected in a smart way, in order to boost performance while minimizing learning costs, what is called semi-supervised learning. In such use case, Spectral Clustering proved to be an efficient method. Also, Deep Learning outperformed several state of the art classification approaches and it is interesting to test it in our context. In this paper, we firstly introduce the concept of Deep Learning into an active semi-supervised clustering process and compare it with Spectral Clustering. Secondly, we introduce constraint propagation and demonstrate how it maximizes partitioning quality while reducing annotation costs. Experimental validation is conducted on two different real datasets. Results show the potential of the clustering methods.
在我们这个数据驱动的世界里,分类对于帮助最终用户和决策者理解信息结构是非常重要的。监督学习技术依赖于通常难以获得且训练经常过拟合的带注释的样本。另一方面,无监督聚类技术在不处理任何训练数据的情况下研究数据的结构。考虑到任务的难度,监督学习通常优于非监督学习。一种折衷的方法是使用部分知识,以一种明智的方式选择,以提高性能,同时最小化学习成本,即所谓的半监督学习。在这种用例中,谱聚类被证明是一种有效的方法。此外,深度学习优于几种最先进的分类方法,在我们的环境中测试它是很有趣的。本文首先将深度学习的概念引入到主动半监督聚类过程中,并将其与谱聚类进行比较。其次,我们引入了约束传播,并演示了它如何在降低注释成本的同时最大化分区质量。在两个不同的真实数据集上进行了实验验证。结果表明了聚类方法的潜力。
{"title":"Deep learning vs spectral clustering into an active clustering with pairwise constraints propagation","authors":"Nicolas Voiron, A. Benoît, P. Lambert, B. Ionescu","doi":"10.1109/CBMI.2016.7500237","DOIUrl":"https://doi.org/10.1109/CBMI.2016.7500237","url":null,"abstract":"In our data driven world, categorization is of major importance to help end-users and decision makers understanding information structures. Supervised learning techniques rely on annotated samples that are often difficult to obtain and training often overfits. On the other hand, unsupervised clustering techniques study the structure of the data without disposing of any training data. Given the difficulty of the task, supervised learning often outperforms unsupervised learning. A compromise is to use a partial knowledge, selected in a smart way, in order to boost performance while minimizing learning costs, what is called semi-supervised learning. In such use case, Spectral Clustering proved to be an efficient method. Also, Deep Learning outperformed several state of the art classification approaches and it is interesting to test it in our context. In this paper, we firstly introduce the concept of Deep Learning into an active semi-supervised clustering process and compare it with Spectral Clustering. Secondly, we introduce constraint propagation and demonstrate how it maximizes partitioning quality while reducing annotation costs. Experimental validation is conducted on two different real datasets. Results show the potential of the clustering methods.","PeriodicalId":356608,"journal":{"name":"2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126672093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Comparing and combining unimodal methods for multimodal recognition 多模态识别中单模态方法的比较与结合
Pub Date : 2016-06-01 DOI: 10.1109/CBMI.2016.7500253
S. Ishikawa, Jorma T. Laaksonen
Multimodal recognition has recently become more attractive and common method in multimedia information retrieval. In many cases it shows better recognition results than using only unimodal methods. Most of current multimodal recognition methods still depend on unimodal recognition results. Therefore, in order to get better recognition performance, it is important to choose suitable features and classification models for each unimodal recognition task. In this paper, we research several unimodal recognition methods, features for them and their combination techniques, in the application setup of concept detection in image-text data. For image features, we use GoogLeNet deep convolutional neural network (DCNN) activation features and semantic concept vectors. For text features, we use simple binary vectors for tags and word2vec vectors. As the concept detection model, we apply the Multimodal Deep Boltzmann Machine (DBM) model and the Support Vector Machine (SVM) with the linear homogeneous kernel map and the non-linear radial basis function (RBF) kernel. The experimental results with the MIRFLICKR-1M data set show that the Multimodal DBM or the non-linear SVM approaches produce equally good results within the margins of statistical variation.
多模态识别是近年来多媒体信息检索中越来越受欢迎的常用方法。在许多情况下,它比单峰方法的识别效果更好。目前大多数多模态识别方法仍然依赖于单模态识别结果。因此,为了获得更好的识别性能,为每个单峰识别任务选择合适的特征和分类模型是非常重要的。本文研究了几种单模识别方法,它们的特点及其组合技术,在图像-文本数据概念检测中的应用设置。对于图像特征,我们使用GoogLeNet深度卷积神经网络(DCNN)激活特征和语义概念向量。对于文本特征,我们使用简单的二进制向量作为标记和word2vec向量。作为概念检测模型,我们采用了具有线性齐次核映射和非线性径向基函数核的多模态深度玻尔兹曼机(DBM)模型和支持向量机(SVM)。MIRFLICKR-1M数据集的实验结果表明,在统计变异范围内,多模态DBM方法和非线性支持向量机方法的效果相同。
{"title":"Comparing and combining unimodal methods for multimodal recognition","authors":"S. Ishikawa, Jorma T. Laaksonen","doi":"10.1109/CBMI.2016.7500253","DOIUrl":"https://doi.org/10.1109/CBMI.2016.7500253","url":null,"abstract":"Multimodal recognition has recently become more attractive and common method in multimedia information retrieval. In many cases it shows better recognition results than using only unimodal methods. Most of current multimodal recognition methods still depend on unimodal recognition results. Therefore, in order to get better recognition performance, it is important to choose suitable features and classification models for each unimodal recognition task. In this paper, we research several unimodal recognition methods, features for them and their combination techniques, in the application setup of concept detection in image-text data. For image features, we use GoogLeNet deep convolutional neural network (DCNN) activation features and semantic concept vectors. For text features, we use simple binary vectors for tags and word2vec vectors. As the concept detection model, we apply the Multimodal Deep Boltzmann Machine (DBM) model and the Support Vector Machine (SVM) with the linear homogeneous kernel map and the non-linear radial basis function (RBF) kernel. The experimental results with the MIRFLICKR-1M data set show that the Multimodal DBM or the non-linear SVM approaches produce equally good results within the margins of statistical variation.","PeriodicalId":356608,"journal":{"name":"2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123918141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)
全部 Geobiology Appl. Clay Sci. Geochim. Cosmochim. Acta J. Hydrol. Org. Geochem. Carbon Balance Manage. Contrib. Mineral. Petrol. Int. J. Biometeorol. IZV-PHYS SOLID EART+ J. Atmos. Chem. Acta Oceanolog. Sin. Acta Geophys. ACTA GEOL POL ACTA PETROL SIN ACTA GEOL SIN-ENGL AAPG Bull. Acta Geochimica Adv. Atmos. Sci. Adv. Meteorol. Am. J. Phys. Anthropol. Am. J. Sci. Am. Mineral. Annu. Rev. Earth Planet. Sci. Appl. Geochem. Aquat. Geochem. Ann. Glaciol. Archaeol. Anthropol. Sci. ARCHAEOMETRY ARCT ANTARCT ALP RES Asia-Pac. J. Atmos. Sci. ATMOSPHERE-BASEL Atmos. Res. Aust. J. Earth Sci. Atmos. Chem. Phys. Atmos. Meas. Tech. Basin Res. Big Earth Data BIOGEOSCIENCES Geostand. Geoanal. Res. GEOLOGY Geosci. J. Geochem. J. Geochem. Trans. Geosci. Front. Geol. Ore Deposits Global Biogeochem. Cycles Gondwana Res. Geochem. Int. Geol. J. Geophys. Prospect. Geosci. Model Dev. GEOL BELG GROUNDWATER Hydrogeol. J. Hydrol. Earth Syst. Sci. Hydrol. Processes Int. J. Climatol. Int. J. Earth Sci. Int. Geol. Rev. Int. J. Disaster Risk Reduct. Int. J. Geomech. Int. J. Geog. Inf. Sci. Isl. Arc J. Afr. Earth. Sci. J. Adv. Model. Earth Syst. J APPL METEOROL CLIM J. Atmos. Oceanic Technol. J. Atmos. Sol. Terr. Phys. J. Clim. J. Earth Sci. J. Earth Syst. Sci. J. Environ. Eng. Geophys. J. Geog. Sci. Mineral. Mag. Miner. Deposita Mon. Weather Rev. Nat. Hazards Earth Syst. Sci. Nat. Clim. Change Nat. Geosci. Ocean Dyn. Ocean and Coastal Research npj Clim. Atmos. Sci. Ocean Modell. Ocean Sci. Ore Geol. Rev. OCEAN SCI J Paleontol. J. PALAEOGEOGR PALAEOCL PERIOD MINERAL PETROLOGY+ Phys. Chem. Miner. Polar Sci. Prog. Oceanogr. Quat. Sci. Rev. Q. J. Eng. Geol. Hydrogeol. RADIOCARBON Pure Appl. Geophys. Resour. Geol. Rev. Geophys. Sediment. Geol.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1