首页 > 最新文献

2009 Seventh International Workshop on Content-Based Multimedia Indexing最新文献

英文 中文
3D Object Detection and Viewpoint Selection in Sketch Images Using Local Patch-Based Zernike Moments 基于局部patch的Zernike矩在素描图像中的3D目标检测和视点选择
Pub Date : 2009-06-03 DOI: 10.1109/CBMI.2009.29
Anh-Phuong Ta, Christian Wolf, G. Lavoué, A. Baskurt
In this paper we present a new approach to detect and recognize 3D models in 2D storyboards which have been drawn during the production process of animated cartoons. Our method is robust to occlusion, scale and rotation. The lack of texture and color makes it difficult to extract local features of the target object from the sketched storyboard. Therefore the existing approaches using local descriptors like interest points can fail in such images. We propose a new framework which combines patch-based Zernike descriptors with a method enforcing spatial constraints for exactly detecting 3D models represented as a set of 2D views in the storyboards. Experimental results show that the proposed method can deal with partial object occlusion and is suitable for poorly textured objects.
本文提出了一种新的方法来检测和识别动画动画制作过程中绘制的二维故事板中的三维模型。该方法对遮挡、缩放和旋转具有较强的鲁棒性。缺乏纹理和颜色使得从草图故事板中提取目标对象的局部特征变得困难。因此,使用兴趣点等局部描述符的现有方法在此类图像中可能会失败。我们提出了一个新的框架,它结合了基于补丁的Zernike描述符和一种强制执行空间约束的方法,以精确地检测在故事板中表示为一组2D视图的3D模型。实验结果表明,该方法可以处理部分目标遮挡,适用于纹理较差的目标。
{"title":"3D Object Detection and Viewpoint Selection in Sketch Images Using Local Patch-Based Zernike Moments","authors":"Anh-Phuong Ta, Christian Wolf, G. Lavoué, A. Baskurt","doi":"10.1109/CBMI.2009.29","DOIUrl":"https://doi.org/10.1109/CBMI.2009.29","url":null,"abstract":"In this paper we present a new approach to detect and recognize 3D models in 2D storyboards which have been drawn during the production process of animated cartoons. Our method is robust to occlusion, scale and rotation. The lack of texture and color makes it difficult to extract local features of the target object from the sketched storyboard. Therefore the existing approaches using local descriptors like interest points can fail in such images. We propose a new framework which combines patch-based Zernike descriptors with a method enforcing spatial constraints for exactly detecting 3D models represented as a set of 2D views in the storyboards. Experimental results show that the proposed method can deal with partial object occlusion and is suitable for poorly textured objects.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126985557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Content Based Copy Detection with Coarse Audio-Visual Fingerprints 基于内容的粗糙视听指纹复制检测
Pub Date : 2009-06-03 DOI: 10.1109/CBMI.2009.12
A. Saracoglu, E. Esen, Tugrul K. Ates, Banu Oskay Acar, Ünal Zubari, Ezgi C. Ozan, Egemen Özalp, Aydin Alatan, T. Çiloglu
Content Based Copy Detection (CBCD) emerges as a viable choice against active detection methodology of watermarking. The very first reason is that the media already under circulation cannot be marked and secondly, CBCD inherently can endure various severe attacks, which watermarking cannot. Although in general, media content is handled independently as visual and audio in this work both information sources are utilized in a unified framework, in which coarse representation of fundamental features are employed. From the copy detection perspective, number of attacks on audio content is limited with respect to visual case. Therefore audio, if present, is an indispensable part of a robust video copy detection system. In this study, the validity of this statement is presented through various experiments on a large data set.
基于内容的复制检测(CBCD)成为对抗主动水印检测方法的一种可行选择。原因一是已经流通的媒体无法被标记,二是CBCD本身可以承受各种严重的攻击,而水印则不能。虽然在一般情况下,媒体内容被独立处理为视频和音频,但在这项工作中,两个信息源都在一个统一的框架中使用,其中使用了基本特征的粗略表示。从复制检测的角度来看,音频内容的攻击次数相对于视觉情况是有限的。因此,音频,如果存在,是一个鲁棒的视频复制检测系统不可缺少的一部分。在本研究中,通过在大数据集上的各种实验来证明这一说法的有效性。
{"title":"Content Based Copy Detection with Coarse Audio-Visual Fingerprints","authors":"A. Saracoglu, E. Esen, Tugrul K. Ates, Banu Oskay Acar, Ünal Zubari, Ezgi C. Ozan, Egemen Özalp, Aydin Alatan, T. Çiloglu","doi":"10.1109/CBMI.2009.12","DOIUrl":"https://doi.org/10.1109/CBMI.2009.12","url":null,"abstract":"Content Based Copy Detection (CBCD) emerges as a viable choice against active detection methodology of watermarking. The very first reason is that the media already under circulation cannot be marked and secondly, CBCD inherently can endure various severe attacks, which watermarking cannot. Although in general, media content is handled independently as visual and audio in this work both information sources are utilized in a unified framework, in which coarse representation of fundamental features are employed. From the copy detection perspective, number of attacks on audio content is limited with respect to visual case. Therefore audio, if present, is an indispensable part of a robust video copy detection system. In this study, the validity of this statement is presented through various experiments on a large data set.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134600081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Video Browsing Using Interactive Navigation Summaries 使用交互式导航摘要浏览视频
Pub Date : 2009-06-03 DOI: 10.1109/CBMI.2009.40
Klaus Schöffmann, L. Böszörményi
A new approach for interactive video browsing is described. The novelty of the proposed approach is the flexible concept of interactive navigation summaries. Similar to time sliders, commonly used with standard soft video players, navigation summaries allow random access to a video. In addition, they also provide abstract visualizations of the content at a user-defined level of detail and, thus, quickly communicate content characteristics to the user. Navigation summaries can provide visual information about both low-level features but even high-level features. The concept fully integrates the user, who knows best which navigation summary at which level of detail could be most beneficial for his/her current video browsing task, and provide him/her a flexible set of navigation means. A first user study has shown that our approach can significantly outperform standard soft video players - the state-of-the art "poor man's" video browsing tool.
介绍了一种交互式视频浏览的新方法。该方法的新颖之处在于交互式导航摘要的灵活概念。与通常用于标准软视频播放器的时间滑块类似,导航摘要允许随机访问视频。此外,它们还在用户定义的细节级别上提供内容的抽象可视化,从而快速地将内容特征传达给用户。导航摘要可以提供关于低级功能甚至高级功能的可视化信息。这个概念充分整合了用户,用户最清楚哪个层次的导航摘要对他/她当前的视频浏览任务最有利,并为他/她提供了一套灵活的导航方法。第一个用户研究表明,我们的方法可以显著优于标准的软视频播放器——最先进的“穷人”视频浏览工具。
{"title":"Video Browsing Using Interactive Navigation Summaries","authors":"Klaus Schöffmann, L. Böszörményi","doi":"10.1109/CBMI.2009.40","DOIUrl":"https://doi.org/10.1109/CBMI.2009.40","url":null,"abstract":"A new approach for interactive video browsing is described. The novelty of the proposed approach is the flexible concept of interactive navigation summaries. Similar to time sliders, commonly used with standard soft video players, navigation summaries allow random access to a video. In addition, they also provide abstract visualizations of the content at a user-defined level of detail and, thus, quickly communicate content characteristics to the user. Navigation summaries can provide visual information about both low-level features but even high-level features. The concept fully integrates the user, who knows best which navigation summary at which level of detail could be most beneficial for his/her current video browsing task, and provide him/her a flexible set of navigation means. A first user study has shown that our approach can significantly outperform standard soft video players - the state-of-the art \"poor man's\" video browsing tool.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"28 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127560141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Special Session: Scalable Video Indexing 特别会议:可扩展的视频索引
Pub Date : 2009-06-03 DOI: 10.1109/CBMI.2009.55
Ewa Kijak, J. Benois-Pineau
The scope of this special session is to cover all aspects that relate to the indexing and retrieval of images and video content dealing with scalability issue (especially in the context of JPEG2000 and MPEG-4 AVC/H.264 coding). The papers from the session are briefly summarized.
本次特别会议的范围涵盖了涉及可扩展性问题的图像和视频内容的索引和检索的所有方面(特别是在JPEG2000和MPEG-4 AVC/H的背景下)。264编码)。简要总结会议的文件。
{"title":"Special Session: Scalable Video Indexing","authors":"Ewa Kijak, J. Benois-Pineau","doi":"10.1109/CBMI.2009.55","DOIUrl":"https://doi.org/10.1109/CBMI.2009.55","url":null,"abstract":"The scope of this special session is to cover all aspects that relate to the indexing and retrieval of images and video content dealing with scalability issue (especially in the context of JPEG2000 and MPEG-4 AVC/H.264 coding). The papers from the session are briefly summarized.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123900065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rushes Video Parsing Using Video Sequence Alignment rush视频解析使用视频序列对齐
Pub Date : 2009-06-03 DOI: 10.1109/CBMI.2009.49
Emilie Dumont, B. Mérialdo
In this paper, we propose a novel method inspired by the bio-informatics domain to parse a rushes video into scenes and takes. The Smith-Waterman algorithm provides an efficient way to compare sequences by comparing segments of all possible lengths and optimizing the similarity measure. We propose to adapt this method in order to detect repetitive sequences in rushes video. Based on the alignments found, we can parse the video into scenes and takes. By comparing takes together, we can select the most complete take in each scene. This method is evaluated on several rushes videos from the TRECVID BBC Rushes Summarization campaign.
在本文中,我们提出了一种受生物信息学领域启发的新方法,将一个匆忙的视频解析成场景和镜头。Smith-Waterman算法通过比较所有可能长度的片段并优化相似性度量,提供了一种有效的方法来比较序列。我们建议将这种方法应用于检测灯芯草视频中的重复序列。根据找到的排列,我们可以把视频解析成场景和镜头。通过比较,我们可以在每个场景中选择最完整的镜头。该方法在来自TRECVID BBC rush summary活动的几个rush视频中进行了评估。
{"title":"Rushes Video Parsing Using Video Sequence Alignment","authors":"Emilie Dumont, B. Mérialdo","doi":"10.1109/CBMI.2009.49","DOIUrl":"https://doi.org/10.1109/CBMI.2009.49","url":null,"abstract":"In this paper, we propose a novel method inspired by the bio-informatics domain to parse a rushes video into scenes and takes. The Smith-Waterman algorithm provides an efficient way to compare sequences by comparing segments of all possible lengths and optimizing the similarity measure. We propose to adapt this method in order to detect repetitive sequences in rushes video. Based on the alignments found, we can parse the video into scenes and takes. By comparing takes together, we can select the most complete take in each scene. This method is evaluated on several rushes videos from the TRECVID BBC Rushes Summarization campaign.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130505644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Semi-automatic BPT for Image Retrieval 半自动BPT图像检索
Pub Date : 2009-06-03 DOI: 10.1109/CBMI.2009.17
Shirin Ghanbari, J. Woods, S. Lucas
This paper presents a novel semi-automatic tool for content retrieval. A multi-dimension Binary Partition Tree (BPT) is generated to perform object based image retrieval. The tree is colour based but has the advantage of incorporating spatial frequency to form semantically meaningful tree nodes. For retrieval, a node of the query image is matched against the nodes of the BPT of the database image. These are matched according to a combination of colour histograms, texture features and edge histograms. This semi-automatic tool allows users to have more freedom in their choice of query. The paper illustrates how the use of multi-dimensional information can significantly enhance content retrieval results for natural images.
提出了一种新的半自动内容检索工具。生成了一个多维二叉分割树(BPT)来执行基于对象的图像检索。该树是基于颜色的,但具有结合空间频率形成语义上有意义的树节点的优势。对于检索,查询图像的节点与数据库图像的BPT的节点进行匹配。根据颜色直方图、纹理特征和边缘直方图的组合进行匹配。这个半自动工具允许用户在选择查询时有更多的自由。本文说明了如何使用多维信息可以显著提高自然图像的内容检索结果。
{"title":"Semi-automatic BPT for Image Retrieval","authors":"Shirin Ghanbari, J. Woods, S. Lucas","doi":"10.1109/CBMI.2009.17","DOIUrl":"https://doi.org/10.1109/CBMI.2009.17","url":null,"abstract":"This paper presents a novel semi-automatic tool for content retrieval. A multi-dimension Binary Partition Tree (BPT) is generated to perform object based image retrieval. The tree is colour based but has the advantage of incorporating spatial frequency to form semantically meaningful tree nodes. For retrieval, a node of the query image is matched against the nodes of the BPT of the database image. These are matched according to a combination of colour histograms, texture features and edge histograms. This semi-automatic tool allows users to have more freedom in their choice of query. The paper illustrates how the use of multi-dimensional information can significantly enhance content retrieval results for natural images.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132249679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Special Session: Multimedia Indexing for Content Based Search 专题会议:基于内容搜索的多媒体索引
Pub Date : 2009-06-03 DOI: 10.1109/CBMI.2009.54
M. Brambilla, F. Nucci
Worldwide, the volume of stored information is growing exponentially, and an increasing share is audiovisual content. This content drives the demand for new services, making audiovisual search one of the major challenges for organisations and businesses today. Digital data is the greatest value that many organisations possess, and the ability to use it, rather than just store it, will be one of the most important strategic aspects in the coming decade. In this scenario, several research efforts focused on studying advanced search architectures for enabling consumers, businesses, and organisations to unlock the values found in audiovisual content through innovative access paradigms. In particular, the focus of current researches is on managing and enabling access to information sources of all types, supporting advanced audiovisual processing and content handling that will enhance control, creation, and sharing of multimedia for all users in the value chain. Several research projects financed by the European Commission tackle this problem from different perspectives and provide diverse visions for the future. This will impact in the mid-long term on audiovisual industry, allowing companies to provide more effective and efficient access to contents thanks to innovative annotation techniques and search paradigms. The aim of this CBMI special session, organized with the support of the PHAROS project 1 , is to offer an overview of the research initiatives at European level that address the problems related to processing, annotation, indexing, and provisioning of contents within search applications. The session includes nine peer-reviewed contributions, reported in this volume, and an invited speech. The invited speech, given by professor Stefano Ceri, from Politecnico di Milano, Italy, will deliver a visionary discussion on the topic of Search Computing, a novel multi-disciplinary science which will provide the abstractions, foundations, methods, and tools required to answer cross-domain search queries, that cannot be addressed by 1 PHAROS (Platform for searcHing of Audiovisual Resources across Online Spaces) Integrated Project (IST-2005-2.6.3) financed by the EC IST 6th Framework. current search engines. A typical example of multi-domain query is “Where can I attend an interesting Information Retrieval conference close to a sunny beach, with direct flight connection to Europe and having a nice and cheap hotel accomodation?”. The generality of the problem makes it extremely relevant for the information retrieval community and poses additional challenges to the field of multimedia content annotation and indexing. The Search Computing project is currently financed by the ERC under the IDEAS Advanced Grants programme. The other contributions to the session include a work by Daras and Axenopoulos, that present a novel view-based approach for 3D object retrieval, that exploits automatic generation of a set of 2D images from a 3D object for calculating a glo
在世界范围内,存储的信息量呈指数级增长,视听内容所占的份额越来越大。这些内容推动了对新服务的需求,使视听搜索成为当今组织和企业面临的主要挑战之一。数字数据是许多组织拥有的最大价值,而使用它的能力,而不仅仅是存储它,将是未来十年最重要的战略方面之一。在这种情况下,一些研究工作集中于研究高级搜索架构,以使消费者、企业和组织能够通过创新的访问范式释放在视听内容中发现的价值。特别是,当前研究的重点是管理和访问所有类型的信息源,支持先进的视听处理和内容处理,这将增强价值链中所有用户对多媒体的控制、创建和共享。由欧盟委员会资助的几个研究项目从不同的角度解决了这个问题,并为未来提供了不同的愿景。这将在中长期内对视听行业产生影响,使公司能够提供更有效和高效的内容访问,这要归功于创新的注释技术和搜索范例。这次CBMI特别会议是在PHAROS项目1的支持下组织的,目的是概述欧洲层面的研究计划,这些研究计划解决了搜索应用程序中与处理、注释、索引和提供内容相关的问题。会议包括九篇同行评议的文章,在本卷中报道,并邀请演讲。本次特约演讲由意大利米兰理工大学的Stefano Ceri教授发表,他将对搜索计算这一主题进行富有远见的讨论,这是一门新颖的多学科科学,它将提供回答跨域搜索查询所需的抽象、基础、方法和工具,这是由EC IST第六框架资助的1 PHAROS(跨在线空间搜索视听资源平台)综合项目(IST-2005-2.6.3)无法解决的问题。当前的搜索引擎。多域查询的一个典型例子是“我在哪里可以参加一个有趣的信息检索会议,靠近阳光明媚的海滩,有直达欧洲的航班,并且有一个又好又便宜的酒店住宿?”该问题的普遍性使其与信息检索界的关系极为密切,并对多媒体内容标注和索引领域提出了新的挑战。搜索计算项目目前由ERC在IDEAS高级资助计划下资助。会议的其他贡献包括Daras和Axenopoulos的工作,他们提出了一种新的基于视图的3D对象检索方法,该方法利用从3D对象自动生成一组2D图像来计算两个3D模型之间的全局形状相似性,可以支持多模态查询。Bozzon、Brambilla和Fraternali的工作讨论了使用模型驱动的方法来指定多媒体索引过程,验证这些过程中感兴趣的属性,并生成编排组件的代码,以便在不断变化的需求中实现内容分析过程的快速原型。Zidouni, Quafafou和Glotin专注于音频转录中结构在命名实体检索中的作用。他们利用这些信息,通过条件随机场(CRFs)提取,推断出概念(命名实体)空间的最佳层次结构,这些概念(命名实体)由节点或层次结构中的任何子路径表示。
{"title":"Special Session: Multimedia Indexing for Content Based Search","authors":"M. Brambilla, F. Nucci","doi":"10.1109/CBMI.2009.54","DOIUrl":"https://doi.org/10.1109/CBMI.2009.54","url":null,"abstract":"Worldwide, the volume of stored information is growing exponentially, and an increasing share is audiovisual content. This content drives the demand for new services, making audiovisual search one of the major challenges for organisations and businesses today. Digital data is the greatest value that many organisations possess, and the ability to use it, rather than just store it, will be one of the most important strategic aspects in the coming decade. In this scenario, several research efforts focused on studying advanced search architectures for enabling consumers, businesses, and organisations to unlock the values found in audiovisual content through innovative access paradigms. In particular, the focus of current researches is on managing and enabling access to information sources of all types, supporting advanced audiovisual processing and content handling that will enhance control, creation, and sharing of multimedia for all users in the value chain. Several research projects financed by the European Commission tackle this problem from different perspectives and provide diverse visions for the future. This will impact in the mid-long term on audiovisual industry, allowing companies to provide more effective and efficient access to contents thanks to innovative annotation techniques and search paradigms. The aim of this CBMI special session, organized with the support of the PHAROS project 1 , is to offer an overview of the research initiatives at European level that address the problems related to processing, annotation, indexing, and provisioning of contents within search applications. The session includes nine peer-reviewed contributions, reported in this volume, and an invited speech. The invited speech, given by professor Stefano Ceri, from Politecnico di Milano, Italy, will deliver a visionary discussion on the topic of Search Computing, a novel multi-disciplinary science which will provide the abstractions, foundations, methods, and tools required to answer cross-domain search queries, that cannot be addressed by 1 PHAROS (Platform for searcHing of Audiovisual Resources across Online Spaces) Integrated Project (IST-2005-2.6.3) financed by the EC IST 6th Framework. current search engines. A typical example of multi-domain query is “Where can I attend an interesting Information Retrieval conference close to a sunny beach, with direct flight connection to Europe and having a nice and cheap hotel accomodation?”. The generality of the problem makes it extremely relevant for the information retrieval community and poses additional challenges to the field of multimedia content annotation and indexing. The Search Computing project is currently financed by the ERC under the IDEAS Advanced Grants programme. The other contributions to the session include a work by Daras and Axenopoulos, that present a novel view-based approach for 3D object retrieval, that exploits automatic generation of a set of 2D images from a 3D object for calculating a glo","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133155485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Action Categorization in Soccer Videos Using String Kernels 使用字符串核的足球视频动作分类
Pub Date : 2009-06-03 DOI: 10.1109/CBMI.2009.10
Lamberto Ballan, M. Bertini, A. Bimbo, G. Serra
Action recognition is a crucial task to provide high-level semantic description of the video content, particularly in the case of sports videos. The bag-of-words (BoW) approach has proven to be successful for the categorization of objects and scenes in images, but it's unable to model temporal information between consecutive frames for video event recognition. In this paper, we present an approach to model actions as a sequence of histograms (one for each frame) represented using a traditional bag-of-words model. Actions are so described by a string (phrase) of variable size, depending on the clip's length, where each frame's representation is considered as a character. To compare these strings we use Needlemann-Wunsch distance, a metrics defined in the information theory, that deal with strings of different length. Finally, SVMs with a string kernel that includes this distance are used to perform classification. Experimental results demonstrate the validity of the proposed approach and they show that it outperforms baseline kNN classifiers.
动作识别是对视频内容进行高级语义描述的一项重要任务,尤其是在体育视频中。词袋(BoW)方法已被证明可以成功地对图像中的物体和场景进行分类,但它无法对视频事件识别中连续帧之间的时间信息进行建模。在本文中,我们提出了一种方法,将动作建模为使用传统词袋模型表示的直方图序列(每帧一个直方图)。动作是由一个可变大小的字符串(短语)来描述的,这取决于剪辑的长度,其中每一帧的表示都被视为一个字符。为了比较这些字符串,我们使用Needlemann-Wunsch距离,这是信息论中定义的一个度量,用于处理不同长度的字符串。最后,使用包含该距离的字符串内核的svm来执行分类。实验结果证明了该方法的有效性,并表明它优于基线kNN分类器。
{"title":"Action Categorization in Soccer Videos Using String Kernels","authors":"Lamberto Ballan, M. Bertini, A. Bimbo, G. Serra","doi":"10.1109/CBMI.2009.10","DOIUrl":"https://doi.org/10.1109/CBMI.2009.10","url":null,"abstract":"Action recognition is a crucial task to provide high-level semantic description of the video content, particularly in the case of sports videos. The bag-of-words (BoW) approach has proven to be successful for the categorization of objects and scenes in images, but it's unable to model temporal information between consecutive frames for video event recognition. In this paper, we present an approach to model actions as a sequence of histograms (one for each frame) represented using a traditional bag-of-words model. Actions are so described by a string (phrase) of variable size, depending on the clip's length, where each frame's representation is considered as a character. To compare these strings we use Needlemann-Wunsch distance, a metrics defined in the information theory, that deal with strings of different length. Finally, SVMs with a string kernel that includes this distance are used to perform classification. Experimental results demonstrate the validity of the proposed approach and they show that it outperforms baseline kNN classifiers.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130631345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Dominant Color Extraction Based on Dynamic Clustering by Multi-dimensional Particle Swarm Optimization 基于多维粒子群优化动态聚类的主色提取
Pub Date : 2009-06-03 DOI: 10.1109/CBMI.2009.11
S. Kiranyaz, Stefan Uhlmann, M. Gabbouj
Color is the major source of information widely used in image analysis and content-based retrieval. Extracting dominant colors that are prominent in a visual scenery is of utter importance since human visual system primarily uses them for perception. In this paper we address dominant color extraction as a dynamic clustering problem and use techniques based on Particle Swarm Optimization (PSO) for finding optimal (number of) dominant colors in a given color space, distance metric and a proper validity index function. The first technique, so-called Multi-Dimensional (MD) PSO, re-forms the native structure of swarm particles in such a way that they can make inter-dimensional passes with a dedicated dimensional PSO process. Therefore, in a multidimensional search space where the optimum dimension is unknown, swarm particles can seek both positional and dimensional optima. Nevertheless, MD PSO is still susceptible to premature convergences due to lack of divergence. To address this problem we then present Fractional Global Best Formation (FGBF) technique, which basically collects all promising dimensional components and fractionally creates an artificial global-best particle (aGB) that has the potential to be a better “guide” than the PSO’s native gbest particle. We finally propose an efficient color distance metric, which uses a fuzzy model for computing color (dis-) similarities over HSV (or HSL) color space. The comparative evaluations against MPEG-7 dominant color descriptor show the superiority of the proposed technique.
颜色是信息的主要来源,广泛应用于图像分析和基于内容的检索。提取视觉风景中突出的主色是非常重要的,因为人类的视觉系统主要利用它们来感知。在本文中,我们将主色提取作为一个动态聚类问题,并使用基于粒子群优化(PSO)的技术在给定的颜色空间、距离度量和适当的有效性指标函数中找到最优(数量)主色。第一种技术,即所谓的多维粒子群(MD)粒子群优化(PSO),通过一种专用的多维粒子群优化(PSO)工艺,重新形成群粒子群的固有结构,使它们能够在多维空间内通过。因此,在最优维度未知的多维搜索空间中,群粒子可以同时寻找位置最优和维度最优。然而,由于缺乏散度,MD粒子群仍然容易过早收敛。为了解决这个问题,我们提出了分数全局最佳形成(FGBF)技术,该技术基本上收集了所有有希望的维度分量,并分数地创建了一个人工全局最佳粒子(aGB),该粒子有可能成为比PSO的天然全局最佳粒子更好的“向导”。我们最后提出了一种有效的颜色距离度量,它使用模糊模型来计算HSV(或HSL)颜色空间上的颜色(非)相似性。通过与MPEG-7主色描述符的比较,证明了该技术的优越性。
{"title":"Dominant Color Extraction Based on Dynamic Clustering by Multi-dimensional Particle Swarm Optimization","authors":"S. Kiranyaz, Stefan Uhlmann, M. Gabbouj","doi":"10.1109/CBMI.2009.11","DOIUrl":"https://doi.org/10.1109/CBMI.2009.11","url":null,"abstract":"Color is the major source of information widely used in image analysis and content-based retrieval. Extracting dominant colors that are prominent in a visual scenery is of utter importance since human visual system primarily uses them for perception. In this paper we address dominant color extraction as a dynamic clustering problem and use techniques based on Particle Swarm Optimization (PSO) for finding optimal (number of) dominant colors in a given color space, distance metric and a proper validity index function. The first technique, so-called Multi-Dimensional (MD) PSO, re-forms the native structure of swarm particles in such a way that they can make inter-dimensional passes with a dedicated dimensional PSO process. Therefore, in a multidimensional search space where the optimum dimension is unknown, swarm particles can seek both positional and dimensional optima. Nevertheless, MD PSO is still susceptible to premature convergences due to lack of divergence. To address this problem we then present Fractional Global Best Formation (FGBF) technique, which basically collects all promising dimensional components and fractionally creates an artificial global-best particle (aGB) that has the potential to be a better “guide” than the PSO’s native gbest particle. We finally propose an efficient color distance metric, which uses a fuzzy model for computing color (dis-) similarities over HSV (or HSL) color space. The comparative evaluations against MPEG-7 dominant color descriptor show the superiority of the proposed technique.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132838276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
An Empirical Study of Multi-label Learning Methods for Video Annotation 视频标注中多标签学习方法的实证研究
Pub Date : 2009-06-01 DOI: 10.1109/CBMI.2009.37
A. Dimou, Grigorios Tsoumakas, V. Mezaris, Y. Kompatsiaris, I. Vlahavas
This paper presents an experimental comparison of different approaches to learning from multi-labeled video data. We compare state-of-the-art multi-label learning methods on the Media mill Challenge dataset. We employ MPEG-7 and SIFT-based global image descriptors independently and in conjunction using variations of the stacking approach for their fusion. We evaluate the results comparing the different classifiers using both MPEG-7 and SIFT-based descriptors and their fusion. A variety of multi-label evaluation measures is used to explore advantages and disadvantages of the examined classifiers. Results give rise to interesting conclusions.
本文对多标签视频数据学习的不同方法进行了实验比较。我们在Media mill Challenge数据集上比较了最先进的多标签学习方法。我们分别使用MPEG-7和基于sift的全局图像描述符,并结合使用不同的叠加方法进行融合。我们比较了使用MPEG-7和基于sift的描述符及其融合的不同分类器的结果。使用多种多标签评价措施来探索被检查分类器的优缺点。结果产生了有趣的结论。
{"title":"An Empirical Study of Multi-label Learning Methods for Video Annotation","authors":"A. Dimou, Grigorios Tsoumakas, V. Mezaris, Y. Kompatsiaris, I. Vlahavas","doi":"10.1109/CBMI.2009.37","DOIUrl":"https://doi.org/10.1109/CBMI.2009.37","url":null,"abstract":"This paper presents an experimental comparison of different approaches to learning from multi-labeled video data. We compare state-of-the-art multi-label learning methods on the Media mill Challenge dataset. We employ MPEG-7 and SIFT-based global image descriptors independently and in conjunction using variations of the stacking approach for their fusion. We evaluate the results comparing the different classifiers using both MPEG-7 and SIFT-based descriptors and their fusion. A variety of multi-label evaluation measures is used to explore advantages and disadvantages of the examined classifiers. Results give rise to interesting conclusions.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115947425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
期刊
2009 Seventh International Workshop on Content-Based Multimedia Indexing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1