首页 > 最新文献

2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)最新文献

英文 中文
Fast geometric consistency test for real time logo detection 快速几何一致性测试实时标识检测
Pub Date : 2015-06-10 DOI: 10.1109/CBMI.2015.7153636
N. Zikos, A. Delopoulos
In this paper we present a method for logo detection in image collections and streams. The proposed method is based on features, extracted from reference logo images and test images. Extracted features are combined with respect to their similarity in their descriptors' space and afterwards with respect to their geometric consistency on the image plane. The contribution of this paper is a novel method for fast geometric consistency test. Using state of the art fast matching methods, it produces pairs of similar features between the test image and the reference logo image and then examines which pairs are forming a consistent geometry on both the test and the reference logo image. It is noteworthy that the proposed method is scale, rotation and translation invariant. The key advantage of the proposed method is that it exhibits a much lower computational complexity and better performance than the state of the art methods. Experimental results on large scale datasets are presented to support these statements.
本文提出了一种图像集和图像流中标识检测的方法。该方法基于特征,从参考标志图像和测试图像中提取特征。提取的特征根据其在描述符空间中的相似性进行组合,然后根据其在图像平面上的几何一致性进行组合。本文的贡献是一种快速几何一致性检验的新方法。使用最先进的快速匹配方法,它在测试图像和参考标志图像之间产生相似特征对,然后检查哪些特征对在测试图像和参考标志图像上形成一致的几何形状。值得注意的是,该方法具有尺度、旋转、平移不变性。该方法的主要优点是它比现有的方法具有更低的计算复杂度和更好的性能。本文给出了大规模数据集的实验结果来支持这些观点。
{"title":"Fast geometric consistency test for real time logo detection","authors":"N. Zikos, A. Delopoulos","doi":"10.1109/CBMI.2015.7153636","DOIUrl":"https://doi.org/10.1109/CBMI.2015.7153636","url":null,"abstract":"In this paper we present a method for logo detection in image collections and streams. The proposed method is based on features, extracted from reference logo images and test images. Extracted features are combined with respect to their similarity in their descriptors' space and afterwards with respect to their geometric consistency on the image plane. The contribution of this paper is a novel method for fast geometric consistency test. Using state of the art fast matching methods, it produces pairs of similar features between the test image and the reference logo image and then examines which pairs are forming a consistent geometry on both the test and the reference logo image. It is noteworthy that the proposed method is scale, rotation and translation invariant. The key advantage of the proposed method is that it exhibits a much lower computational complexity and better performance than the state of the art methods. Experimental results on large scale datasets are presented to support these statements.","PeriodicalId":387496,"journal":{"name":"2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"37 28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125704353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Pruning near-duplicate images for mobile landmark identification: A graph theoretical approach 修剪近重复图像用于移动地标识别:一种图理论方法
Pub Date : 2015-06-10 DOI: 10.1109/CBMI.2015.7153635
T. Danisman, J. Martinet, Ioan Marius Bilasco
Automatic landmark identification is one of the hot research topics in computer vision domain. Efficient and robust identification of landmark points is a challenging task, especially in a mobile context. This paper addresses the pruning of near-duplicate images for creating representative training image sets to minimize overall query processing complexity and time. We prune different perspectives of real world landmarks to find the smallest set of the most representative images. Inspired from graph theory, we represent each class in a separate graph using geometric verification of well-known RANSAC algorithm. Our iterative method uses maximum coverage information in each iteration to find the minimum representative set to reduce and prioritize the images of the initial dataset. Experiments on Paris dataset show that the proposed method provides robust and accurate results using smaller subsets.
自动地标识别是计算机视觉领域的研究热点之一。高效、稳健地识别地标点是一项具有挑战性的任务,特别是在移动环境中。本文解决了近重复图像的修剪,以创建具有代表性的训练图像集,以最小化总体查询处理的复杂性和时间。我们对现实世界地标的不同视角进行修剪,以找到最具代表性的图像的最小集合。受图论的启发,我们使用著名的RANSAC算法的几何验证在一个单独的图中表示每个类。我们的迭代方法在每次迭代中使用最大覆盖信息来找到最小代表集,以减少初始数据集的图像并对其进行优先级排序。在巴黎数据集上的实验表明,该方法在较小的子集范围内提供了鲁棒性和准确性的结果。
{"title":"Pruning near-duplicate images for mobile landmark identification: A graph theoretical approach","authors":"T. Danisman, J. Martinet, Ioan Marius Bilasco","doi":"10.1109/CBMI.2015.7153635","DOIUrl":"https://doi.org/10.1109/CBMI.2015.7153635","url":null,"abstract":"Automatic landmark identification is one of the hot research topics in computer vision domain. Efficient and robust identification of landmark points is a challenging task, especially in a mobile context. This paper addresses the pruning of near-duplicate images for creating representative training image sets to minimize overall query processing complexity and time. We prune different perspectives of real world landmarks to find the smallest set of the most representative images. Inspired from graph theory, we represent each class in a separate graph using geometric verification of well-known RANSAC algorithm. Our iterative method uses maximum coverage information in each iteration to find the minimum representative set to reduce and prioritize the images of the initial dataset. Experiments on Paris dataset show that the proposed method provides robust and accurate results using smaller subsets.","PeriodicalId":387496,"journal":{"name":"2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132223460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fusion of learned multi-modal representations and dense trajectories for emotional analysis in videos 视频情感分析中习得多模态表征与密集轨迹的融合
Pub Date : 2015-06-10 DOI: 10.1109/CBMI.2015.7153603
Esra Acar, F. Hopfgartner, S. Albayrak
When designing a video affective content analysis algorithm, one of the most important steps is the selection of discriminative features for the effective representation of video segments. The majority of existing affective content analysis methods either use low-level audio-visual features or generate handcrafted higher level representations based on these low-level features. We propose in this work to use deep learning methods, in particular convolutional neural networks (CNNs), in order to automatically learn and extract mid-level representations from raw data. To this end, we exploit the audio and visual modality of videos by employing Mel-Frequency Cepstral Coefficients (MFCC) and color values in the HSV color space. We also incorporate dense trajectory based motion features in order to further enhance the performance of the analysis. By means of multi-class support vector machines (SVMs) and fusion mechanisms, music video clips are classified into one of four affective categories representing the four quadrants of the Valence-Arousal (VA) space. Results obtained on a subset of the DEAP dataset show (1) that higher level representations perform better than low-level features, and (2) that incorporating motion information leads to a notable performance gain, independently from the chosen representation.
在设计视频情感内容分析算法时,最重要的步骤之一是选择判别特征来有效地表示视频片段。大多数现有的情感内容分析方法要么使用低级的视听特征,要么基于这些低级特征生成手工制作的高级表示。在这项工作中,我们建议使用深度学习方法,特别是卷积神经网络(cnn),以便从原始数据中自动学习和提取中级表示。为此,我们通过使用Mel-Frequency倒谱系数(MFCC)和HSV色彩空间中的颜色值来利用视频的音频和视觉模态。为了进一步提高分析的性能,我们还结合了密集的基于轨迹的运动特征。通过多类支持向量机(svm)和融合机制,将音乐视频片段划分为代表Valence-Arousal (VA)空间四个象限的四个情感类别之一。在DEAP数据集的一个子集上获得的结果表明:(1)高级表征比低级特征表现得更好,(2)与所选择的表征无关,合并运动信息会带来显著的性能增益。
{"title":"Fusion of learned multi-modal representations and dense trajectories for emotional analysis in videos","authors":"Esra Acar, F. Hopfgartner, S. Albayrak","doi":"10.1109/CBMI.2015.7153603","DOIUrl":"https://doi.org/10.1109/CBMI.2015.7153603","url":null,"abstract":"When designing a video affective content analysis algorithm, one of the most important steps is the selection of discriminative features for the effective representation of video segments. The majority of existing affective content analysis methods either use low-level audio-visual features or generate handcrafted higher level representations based on these low-level features. We propose in this work to use deep learning methods, in particular convolutional neural networks (CNNs), in order to automatically learn and extract mid-level representations from raw data. To this end, we exploit the audio and visual modality of videos by employing Mel-Frequency Cepstral Coefficients (MFCC) and color values in the HSV color space. We also incorporate dense trajectory based motion features in order to further enhance the performance of the analysis. By means of multi-class support vector machines (SVMs) and fusion mechanisms, music video clips are classified into one of four affective categories representing the four quadrants of the Valence-Arousal (VA) space. Results obtained on a subset of the DEAP dataset show (1) that higher level representations perform better than low-level features, and (2) that incorporating motion information leads to a notable performance gain, independently from the chosen representation.","PeriodicalId":387496,"journal":{"name":"2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129380878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Learned features versus engineered features for semantic video indexing 语义视频索引的学习特征与工程特征
Pub Date : 2015-06-10 DOI: 10.1109/CBMI.2015.7153637
Mateusz Budnik, Efrain-Leonardo Gutierrez-Gomez, Bahjat Safadi, G. Quénot
In this paper, we compare “traditional” engineered (hand-crafted) features (or descriptors) and learned features for content-based semantic indexing of video documents. Learned (or semantic) features are obtained by training classifiers for other target concepts on other data. These classifiers are then applied to the current collection. The vector of classification scores is the new feature used for training a classifier for the current target concepts on the current collection. If the classifiers used on the other collection are of the Deep Convolutional Neural Network (DCNN) type, it is possible to use as a new feature not only the score values provided by the last layer but also the intermediate values corresponding to the output of all the hidden layers. We made an extensive comparison of the performance of such features with traditional engineered ones as well as with combinations of them. The comparison was made in the context of the TRECVid semantic indexing task. Our results confirm those obtained for still images: features learned from other training data generally outperform engineered features for concept recognition. Additionally, we found that directly training SVM classifiers using these features does significantly better than partially retraining the DCNN for adapting it to the new data. We also found that, even though the learned features performed better that the engineered ones, the fusion of both of them perform significantly better, indicating that engineered features are still useful, at least in this case.
在本文中,我们比较了基于内容的视频文档语义索引的“传统”工程(手工制作)特征(或描述符)和学习特征。学习到的(或语义的)特征是通过在其他数据上训练其他目标概念的分类器来获得的。然后将这些分类器应用于当前集合。分类分数向量是用于训练当前集合上当前目标概念的分类器的新特征。如果在另一个集合上使用的分类器是深度卷积神经网络(Deep Convolutional Neural Network, DCNN)类型的分类器,则不仅可以使用最后一层提供的得分值,还可以使用与所有隐藏层的输出相对应的中间值作为新特征。我们将这些特性的性能与传统的工程特性以及它们的组合进行了广泛的比较。比较是在TRECVid语义索引任务的背景下进行的。我们的结果证实了从静态图像中获得的结果:从其他训练数据中学习的特征通常优于概念识别的工程特征。此外,我们发现使用这些特征直接训练SVM分类器比部分重新训练DCNN使其适应新数据要好得多。我们还发现,尽管学习的特征比设计的特征表现得更好,但两者的融合表现得更好,这表明设计的特征仍然是有用的,至少在这种情况下。
{"title":"Learned features versus engineered features for semantic video indexing","authors":"Mateusz Budnik, Efrain-Leonardo Gutierrez-Gomez, Bahjat Safadi, G. Quénot","doi":"10.1109/CBMI.2015.7153637","DOIUrl":"https://doi.org/10.1109/CBMI.2015.7153637","url":null,"abstract":"In this paper, we compare “traditional” engineered (hand-crafted) features (or descriptors) and learned features for content-based semantic indexing of video documents. Learned (or semantic) features are obtained by training classifiers for other target concepts on other data. These classifiers are then applied to the current collection. The vector of classification scores is the new feature used for training a classifier for the current target concepts on the current collection. If the classifiers used on the other collection are of the Deep Convolutional Neural Network (DCNN) type, it is possible to use as a new feature not only the score values provided by the last layer but also the intermediate values corresponding to the output of all the hidden layers. We made an extensive comparison of the performance of such features with traditional engineered ones as well as with combinations of them. The comparison was made in the context of the TRECVid semantic indexing task. Our results confirm those obtained for still images: features learned from other training data generally outperform engineered features for concept recognition. Additionally, we found that directly training SVM classifiers using these features does significantly better than partially retraining the DCNN for adapting it to the new data. We also found that, even though the learned features performed better that the engineered ones, the fusion of both of them perform significantly better, indicating that engineered features are still useful, at least in this case.","PeriodicalId":387496,"journal":{"name":"2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128932636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Web image size prediction for efficient focused image crawling 用于高效聚焦图像爬行的Web图像大小预测
Pub Date : 2015-06-10 DOI: 10.1109/CBMI.2015.7153609
K. Andreadou, S. Papadopoulos, Y. Kompatsiaris
In the context of using Web image content for analysis and retrieval, it is typically necessary to perform large-scale image crawling. A serious bottleneck in such set-ups pertains to the fetching of image content, since for each web page a large number of HTTP requests need to be issued to download all included image elements. In practice, however, only the relatively big images (e.g., larger than 400 pixels in width and height) are potentially of interest, since most of the smaller ones are irrelevant to the main subject or correspond to decorative elements (e.g., icons, buttons). Given that there is often no dimension information in the HTML img tag of images, to filter out small images, an image crawler would still need to issue a GET request and download the respective files before deciding whether to index them. To address this limitation, in this paper, we explore the challenge of predicting the size of images on the Web based only on their URL and information extracted from the surrounding HTML code. We present two different methodologies: The first one is based on a common text classification approach using the n-grams or tokens of the image URLs and the second one relies on the HTML elements surrounding the image. Eventually, we combine these two techniques, and achieve considerable improvement in terms of accuracy, leading to a highly effective filtering component that can significantly improve the speed and efficiency of the image crawler.
在使用Web图像内容进行分析和检索的上下文中,通常需要执行大规模图像爬行。这种设置的一个严重瓶颈是图像内容的获取,因为对于每个网页,需要发出大量的HTTP请求来下载所有包含的图像元素。然而,在实践中,只有相对较大的图像(例如,宽度和高度大于400像素)才是潜在的兴趣,因为大多数较小的图像与主题无关或对应于装饰元素(例如,图标,按钮)。考虑到图像的HTML img标记中通常没有维度信息,为了过滤掉小图像,图像爬虫仍然需要发出GET请求并下载相应的文件,然后再决定是否对它们建立索引。为了解决这一限制,在本文中,我们探讨了仅基于URL和从周围HTML代码中提取的信息来预测Web上图像大小的挑战。我们提出了两种不同的方法:第一种方法基于使用图像url的n-gram或标记的通用文本分类方法,第二种方法依赖于图像周围的HTML元素。最终,我们将这两种技术结合起来,在精度方面取得了相当大的提高,从而得到了一个高效的滤波组件,可以显著提高图像爬虫的速度和效率。
{"title":"Web image size prediction for efficient focused image crawling","authors":"K. Andreadou, S. Papadopoulos, Y. Kompatsiaris","doi":"10.1109/CBMI.2015.7153609","DOIUrl":"https://doi.org/10.1109/CBMI.2015.7153609","url":null,"abstract":"In the context of using Web image content for analysis and retrieval, it is typically necessary to perform large-scale image crawling. A serious bottleneck in such set-ups pertains to the fetching of image content, since for each web page a large number of HTTP requests need to be issued to download all included image elements. In practice, however, only the relatively big images (e.g., larger than 400 pixels in width and height) are potentially of interest, since most of the smaller ones are irrelevant to the main subject or correspond to decorative elements (e.g., icons, buttons). Given that there is often no dimension information in the HTML img tag of images, to filter out small images, an image crawler would still need to issue a GET request and download the respective files before deciding whether to index them. To address this limitation, in this paper, we explore the challenge of predicting the size of images on the Web based only on their URL and information extracted from the surrounding HTML code. We present two different methodologies: The first one is based on a common text classification approach using the n-grams or tokens of the image URLs and the second one relies on the HTML elements surrounding the image. Eventually, we combine these two techniques, and achieve considerable improvement in terms of accuracy, leading to a highly effective filtering component that can significantly improve the speed and efficiency of the image crawler.","PeriodicalId":387496,"journal":{"name":"2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116147786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On efficient content-based near-duplicate video detection 基于内容的高效近重复视频检测
Pub Date : 2015-06-10 DOI: 10.1109/CBMI.2015.7153633
M. S. Uysal, C. Beecks, T. Seidl
The high usage of the Internet, in particular videosharing and social networking websites, have led to enormous amount of video data recently, raising demand on effective and efficient content-based near-duplicate video detection approaches. In this paper, we propose to efficiently search for near-duplicate videos via the utilization of efficient approximation techniques of the well-known effective similarity measure Earth Mover's Distance (EMD). To this end, we model keyframes by flexible feature representations which are then exploited in a filter-and-refine architecture to alleviate the query processing time. The experiments on real data indicate high efficiency guaranteeing reduced number of EMD computations, which contributes to the near-duplicate detection in video datasets.
互联网的高度使用,特别是视频分享和社交网站,导致了大量的视频数据,提高了对有效和高效的基于内容的近重复视频检测方法的需求。在本文中,我们提出利用众所周知的有效相似度量地球移动者距离(EMD)的有效近似技术来有效地搜索近重复视频。为此,我们通过灵活的特征表示来建模关键帧,然后在过滤和细化架构中利用这些特征表示来减少查询处理时间。在实际数据上的实验表明,该方法具有较高的效率,保证了EMD计算量的减少,有助于视频数据集的近重复检测。
{"title":"On efficient content-based near-duplicate video detection","authors":"M. S. Uysal, C. Beecks, T. Seidl","doi":"10.1109/CBMI.2015.7153633","DOIUrl":"https://doi.org/10.1109/CBMI.2015.7153633","url":null,"abstract":"The high usage of the Internet, in particular videosharing and social networking websites, have led to enormous amount of video data recently, raising demand on effective and efficient content-based near-duplicate video detection approaches. In this paper, we propose to efficiently search for near-duplicate videos via the utilization of efficient approximation techniques of the well-known effective similarity measure Earth Mover's Distance (EMD). To this end, we model keyframes by flexible feature representations which are then exploited in a filter-and-refine architecture to alleviate the query processing time. The experiments on real data indicate high efficiency guaranteeing reduced number of EMD computations, which contributes to the near-duplicate detection in video datasets.","PeriodicalId":387496,"journal":{"name":"2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116624896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Empirical evaluation of dissimilarity measures for 3D object retrieval with application to multi-feature retrieval 三维目标检索的不相似测度及其在多特征检索中的应用
Pub Date : 2015-06-10 DOI: 10.1109/CBMI.2015.7153629
Robert Gregor, Andreas Lamprecht, I. Sipiran, T. Schreck, B. Bustos
A common approach for implementing content-based multimedia retrieval tasks resorts to extracting high-dimensional feature vectors from the multimedia objects. In combination with an appropriate dissimilarity function, such as the well-known Lp functions or statistical measures like χ2, one can rank objects by dissimilarity with respect to a query. For many multimedia retrieval problems, a large number of feature extraction methods have been proposed and experimentally evaluated for their effectiveness. Much less work has been done to systematically study the impact of the choice of dissimilarity function on the retrieval effectiveness. Inspired by previous work which compared dissimilarity functions for image retrieval, we provide an extensive comparison of dissimilarity measures for 3D object retrieval. Our study is based on an encompassing set of feature extractors, dissimilarity measures and benchmark data sets. We identify the best performing dissimilarity measures and in turn identify dependencies between well-performing dissimilarity measures and types of 3D features. Based on these findings, we show that the effectiveness of 3D retrieval can be improved by a feature-dependent measure choice. In addition, we apply different normalization schemes to the dissimilarity distributions in order to show improved retrieval effectiveness for late fusion of multi-feature combination. Finally, we present preliminary findings on the correlation of rankings for dissimilarity measures, which could be exploited for further improvement of retrieval effectiveness for single features as well as combinations.
实现基于内容的多媒体检索任务的常用方法是从多媒体对象中提取高维特征向量。结合适当的不相似函数(如众所周知的Lp函数或χ2等统计度量),可以根据查询的不相似度对对象进行排序。针对许多多媒体检索问题,已经提出了大量的特征提取方法,并对其有效性进行了实验评估。系统地研究不相似函数的选择对检索效果的影响的工作很少。受先前比较图像检索的不相似函数的工作的启发,我们为3D对象检索提供了广泛的不相似度量的比较。我们的研究是基于一套完整的特征提取器、不相似度量和基准数据集。我们确定了表现最好的不相似度量,进而确定了表现良好的不相似度量和3D特征类型之间的依赖关系。基于这些发现,我们证明了基于特征的度量选择可以提高三维检索的有效性。此外,为了提高多特征组合后期融合的检索效率,我们对不相似分布采用了不同的归一化方案。最后,我们提出了不同度量的排名相关性的初步发现,这可以用于进一步提高单个特征和组合的检索效率。
{"title":"Empirical evaluation of dissimilarity measures for 3D object retrieval with application to multi-feature retrieval","authors":"Robert Gregor, Andreas Lamprecht, I. Sipiran, T. Schreck, B. Bustos","doi":"10.1109/CBMI.2015.7153629","DOIUrl":"https://doi.org/10.1109/CBMI.2015.7153629","url":null,"abstract":"A common approach for implementing content-based multimedia retrieval tasks resorts to extracting high-dimensional feature vectors from the multimedia objects. In combination with an appropriate dissimilarity function, such as the well-known Lp functions or statistical measures like χ2, one can rank objects by dissimilarity with respect to a query. For many multimedia retrieval problems, a large number of feature extraction methods have been proposed and experimentally evaluated for their effectiveness. Much less work has been done to systematically study the impact of the choice of dissimilarity function on the retrieval effectiveness. Inspired by previous work which compared dissimilarity functions for image retrieval, we provide an extensive comparison of dissimilarity measures for 3D object retrieval. Our study is based on an encompassing set of feature extractors, dissimilarity measures and benchmark data sets. We identify the best performing dissimilarity measures and in turn identify dependencies between well-performing dissimilarity measures and types of 3D features. Based on these findings, we show that the effectiveness of 3D retrieval can be improved by a feature-dependent measure choice. In addition, we apply different normalization schemes to the dissimilarity distributions in order to show improved retrieval effectiveness for late fusion of multi-feature combination. Finally, we present preliminary findings on the correlation of rankings for dissimilarity measures, which could be exploited for further improvement of retrieval effectiveness for single features as well as combinations.","PeriodicalId":387496,"journal":{"name":"2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"18 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113961386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A multi-dimensional meter-adaptive method for automatic segmentation of music 一种多维度自适应音乐自动分割方法
Pub Date : 2015-06-10 DOI: 10.1109/CBMI.2015.7153601
Cyril Gaudefroy, H. Papadopoulos, M. Kowalski
Music structure appears on a wide variety of temporal levels (notes, bars, phrases, etc). Its highest-level expression is therefore dependent on music's lower-level organization, especially beats and bars. We propose a method for automatic structure segmentation that uses musically meaningful information and is content-adaptive. It relies on a meter-adaptive signal representation that prevents from the use of empirical parameters. Moreover, our method is designed to combine multiple signal features to account for various musical dimensions. Finally, it also combines multiple structural principles that yield complementary results. The resulting algorithm proves to already outperform state-of-the-art methods, especially within small tolerance windows, and yet offers several encouraging improvement directions.
音乐结构出现在各种各样的时间层次上(音符、小节、乐句等)。因此,它的最高层次的表达依赖于音乐的较低层次的组织,特别是节拍和小节。我们提出了一种利用音乐意义信息和内容自适应的自动结构分割方法。它依赖于仪表自适应信号表示,防止使用经验参数。此外,我们的方法被设计成结合多个信号特征来考虑不同的音乐维度。最后,它还结合了多种结构原则,产生互补的结果。结果证明,该算法已经优于最先进的方法,特别是在小公差窗口内,并且提供了几个令人鼓舞的改进方向。
{"title":"A multi-dimensional meter-adaptive method for automatic segmentation of music","authors":"Cyril Gaudefroy, H. Papadopoulos, M. Kowalski","doi":"10.1109/CBMI.2015.7153601","DOIUrl":"https://doi.org/10.1109/CBMI.2015.7153601","url":null,"abstract":"Music structure appears on a wide variety of temporal levels (notes, bars, phrases, etc). Its highest-level expression is therefore dependent on music's lower-level organization, especially beats and bars. We propose a method for automatic structure segmentation that uses musically meaningful information and is content-adaptive. It relies on a meter-adaptive signal representation that prevents from the use of empirical parameters. Moreover, our method is designed to combine multiple signal features to account for various musical dimensions. Finally, it also combines multiple structural principles that yield complementary results. The resulting algorithm proves to already outperform state-of-the-art methods, especially within small tolerance windows, and yet offers several encouraging improvement directions.","PeriodicalId":387496,"journal":{"name":"2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121690604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An ontology framework for automated visual surveillance system 自动化视觉监控系统的本体框架
Pub Date : 2015-06-10 DOI: 10.1109/CBMI.2015.7153628
Faranak Sobhani, N. F. Kahar, Qianni Zhang
This paper presents analysis and development of a forensic domain ontology to support an automated visual surveillance system. The proposed domain ontology is built on a specific use case based on the severe riots that swept across major UK cities with devastating effects during the summer 2011. The proposed ontology aims at facilitating the description of activities, entities, relationships, resources and consequences of the event. The study exploits 3.07 TB data provided by the Londons Metropolitan Police (Scotland Yard) as a part of European LASIE project1. The data has been analyzed and used to guarantee adherence to a real-world application scenario. A `top-down development' approach to the ontology design has been taken. The ontology is also used to demonstrate how high level reasoning can be incorporated into an automatop-ted forensic system. Thus, the designed ontology is also the base for future development of knowledge inference as response to domain specific queries.
本文分析和开发了一种支持自动视觉监控系统的取证领域本体。提出的领域本体是基于2011年夏季席卷英国主要城市的严重骚乱的特定用例构建的,该骚乱造成了毁灭性的影响。提出的本体旨在促进对活动、实体、关系、资源和事件后果的描述。作为欧洲LASIE项目的一部分,这项研究利用了伦敦警察厅(苏格兰场)提供的3.07 TB数据。对数据进行了分析并使用,以确保符合真实的应用程序场景。本体设计采用了“自顶向下开发”的方法。本体还用于演示如何将高级推理集成到自动化取证系统中。因此,所设计的本体也是未来知识推理作为对特定领域查询的响应的基础。
{"title":"An ontology framework for automated visual surveillance system","authors":"Faranak Sobhani, N. F. Kahar, Qianni Zhang","doi":"10.1109/CBMI.2015.7153628","DOIUrl":"https://doi.org/10.1109/CBMI.2015.7153628","url":null,"abstract":"This paper presents analysis and development of a forensic domain ontology to support an automated visual surveillance system. The proposed domain ontology is built on a specific use case based on the severe riots that swept across major UK cities with devastating effects during the summer 2011. The proposed ontology aims at facilitating the description of activities, entities, relationships, resources and consequences of the event. The study exploits 3.07 TB data provided by the Londons Metropolitan Police (Scotland Yard) as a part of European LASIE project1. The data has been analyzed and used to guarantee adherence to a real-world application scenario. A `top-down development' approach to the ontology design has been taken. The ontology is also used to demonstrate how high level reasoning can be incorporated into an automatop-ted forensic system. Thus, the designed ontology is also the base for future development of knowledge inference as response to domain specific queries.","PeriodicalId":387496,"journal":{"name":"2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125260227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Semi-automatic video object segmentation by advanced manipulation of segmentation hierarchies 半自动视频对象分割的高级操作分割层次
Pub Date : 2015-06-10 DOI: 10.1109/CBMI.2015.7153600
J. Pont-Tuset, Miquel A. Farre, A. Smolic
For applications that require very accurate video object segmentations, semi-automatic algorithms are typically used, which help operators to minimize the annotation time, as off-the-shelf automatic segmentation techniques are still far from being precise enough in this context. This paper presents a novel interface based on a click-and-drag interaction that allows to rapidly select regions from state-of-the-art segmentation hierarchies. The interface is very responsive, allows to obtain very accurate segmentations, and is designed to minimize the human interaction. To evaluate the results, we provide a new set of object video ground truth data.
对于需要非常精确的视频对象分割的应用程序,通常使用半自动算法,这有助于操作员最大限度地减少注释时间,因为现成的自动分割技术在这种情况下仍然远远不够精确。本文提出了一种基于点击-拖动交互的新界面,允许从最先进的分割层次结构中快速选择区域。界面非常灵敏,可以获得非常准确的分割,并且旨在最大限度地减少人机交互。为了评估结果,我们提供了一组新的目标视频地面真值数据。
{"title":"Semi-automatic video object segmentation by advanced manipulation of segmentation hierarchies","authors":"J. Pont-Tuset, Miquel A. Farre, A. Smolic","doi":"10.1109/CBMI.2015.7153600","DOIUrl":"https://doi.org/10.1109/CBMI.2015.7153600","url":null,"abstract":"For applications that require very accurate video object segmentations, semi-automatic algorithms are typically used, which help operators to minimize the annotation time, as off-the-shelf automatic segmentation techniques are still far from being precise enough in this context. This paper presents a novel interface based on a click-and-drag interaction that allows to rapidly select regions from state-of-the-art segmentation hierarchies. The interface is very responsive, allows to obtain very accurate segmentations, and is designed to minimize the human interaction. To evaluate the results, we provide a new set of object video ground truth data.","PeriodicalId":387496,"journal":{"name":"2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127590209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
期刊
2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1