Video summarisation approaches have various fields of application, specifically related to organising, browsing and accessing large video databases. In this paper, the appropriateness of biologically inspired models to tackle these problems is discussed and suitable strategy for unsupervised video summarisation is derived. In our proposal, we model the ability of ants to build live structures with their bodies in order to discover, in a distributed and unsupervised way, a tree-structured organization and summarisation of the video data. An experimental evaluation validating the feasibility and the robustness of this novel approach is presented.
{"title":"Hierarchical Summarisation of Video Using Ant-Tree Strategy","authors":"T. Piatrik, E. Izquierdo","doi":"10.1109/CBMI.2009.50","DOIUrl":"https://doi.org/10.1109/CBMI.2009.50","url":null,"abstract":"Video summarisation approaches have various fields of application, specifically related to organising, browsing and accessing large video databases. In this paper, the appropriateness of biologically inspired models to tackle these problems is discussed and suitable strategy for unsupervised video summarisation is derived. In our proposal, we model the ability of ants to build live structures with their bodies in order to discover, in a distributed and unsupervised way, a tree-structured organization and summarisation of the video data. An experimental evaluation validating the feasibility and the robustness of this novel approach is presented.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131552936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As MPEG standards prevail, the opportunities to handle MPEG compressed videos increase, and the video indexing and management that can directly process the compressed videos become important. MPEG video coding standards use motion compensation to compress video data, and the motion compensation generates motion vectors that contain motion information similar to optical flows between regions in different frames. Although motion vectors are useful for video analysis, they are not always generated along moving objects, and it is difficult to analyze moving objects using only these vectors. In this paper, we propose a moving object detection and tracking method in the MPEG compressed domain for video surveillance and management. In our method, we introduce images that record moving regions and accumulate unmoving regions in which the moving objects are expected to exist after the current frame. By utilizing these images, we can detect and track moving objects using only motion vectors even if the motion vectors of moving objects become zero vectors due to their behaviors and are lost due to their picture type. We demonstrate the effectiveness of the proposed method through several experiments using actual videos acquired by an MPEG video camera.
{"title":"Motion Vector Based Moving Object Detection and Tracking in the MPEG Compressed Domain","authors":"T. Yokoyama, Toshiki Iwasaki, Toshinori Watanabe","doi":"10.1109/CBMI.2009.33","DOIUrl":"https://doi.org/10.1109/CBMI.2009.33","url":null,"abstract":"As MPEG standards prevail, the opportunities to handle MPEG compressed videos increase, and the video indexing and management that can directly process the compressed videos become important. MPEG video coding standards use motion compensation to compress video data, and the motion compensation generates motion vectors that contain motion information similar to optical flows between regions in different frames. Although motion vectors are useful for video analysis, they are not always generated along moving objects, and it is difficult to analyze moving objects using only these vectors. In this paper, we propose a moving object detection and tracking method in the MPEG compressed domain for video surveillance and management. In our method, we introduce images that record moving regions and accumulate unmoving regions in which the moving objects are expected to exist after the current frame. By utilizing these images, we can detect and track moving objects using only motion vectors even if the motion vectors of moving objects become zero vectors due to their behaviors and are lost due to their picture type. We demonstrate the effectiveness of the proposed method through several experiments using actual videos acquired by an MPEG video camera.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115001418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper exploits a media document representation called feature terms to generate a query from multiple media examples, e.g. images. A feature term denotes a continuous interval of a media feature dimension. This approach (1) helps feature accumulation from multiple examples; (2) enables the exploration of text-based retrieval models for multimedia retrieval. Three criteria, minimised χ2, minimised AC/DC and maximised entropy, are proposed to optimise feature term selection. Two ranking functions, KL divergence and BM25, are used for relevance estimation. Experiments on Corel photo collection and TRECVid 2006 collection show the effectiveness in image/video retrieval.
{"title":"Query Generation from Multiple Media Examples","authors":"Reede Ren, J. Jose","doi":"10.1109/CBMI.2009.13","DOIUrl":"https://doi.org/10.1109/CBMI.2009.13","url":null,"abstract":"This paper exploits a media document representation called feature terms to generate a query from multiple media examples, e.g. images. A feature term denotes a continuous interval of a media feature dimension. This approach (1) helps feature accumulation from multiple examples; (2) enables the exploration of text-based retrieval models for multimedia retrieval. Three criteria, minimised χ2, minimised AC/DC and maximised entropy, are proposed to optimise feature term selection. Two ranking functions, KL divergence and BM25, are used for relevance estimation. Experiments on Corel photo collection and TRECVid 2006 collection show the effectiveness in image/video retrieval.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127623505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Picadomo combines content-based image retrieval and faceted search on mobile devices. It is designed for finding images with desired visual properties, tags or other known metadata. Due to the limitations of mobile devices such as small screen sizes and low processing power, we had to carefully select the features that come in use (dominant color, GPS data, tags, etc.). With Picadomo the user can pick visualized facets directly via touch screen, while using very little screen size for the facet browsing navigation. We present our architecture, the facets used for image browsing, our new control concept and user experiments.
{"title":"Picadomo: Faceted Image Browsing for Mobile Devices","authors":"Adrian Hub, Daniel Blank, A. Henrich, W. Müller","doi":"10.1109/CBMI.2009.34","DOIUrl":"https://doi.org/10.1109/CBMI.2009.34","url":null,"abstract":"Picadomo combines content-based image retrieval and faceted search on mobile devices. It is designed for finding images with desired visual properties, tags or other known metadata. Due to the limitations of mobile devices such as small screen sizes and low processing power, we had to carefully select the features that come in use (dominant color, GPS data, tags, etc.). With Picadomo the user can pick visualized facets directly via touch screen, while using very little screen size for the facet browsing navigation. We present our architecture, the facets used for image browsing, our new control concept and user experiments.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128188531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we address the problem of scalable video indexing. We propose a new framework combining sparse spatial multiscale patches and Group of Pictures (GoP) motion patches. The distributions of these sets of patches are compared via the Kullback-Leibler divergence estimated in a non-parametric framework using a k-th Nearest Neighbor (kNN) estimator. We evaluated this similarity measure on selected videos from the ICOS-HD ANR project, probing in particular its robustness to resampling and compression and thus showing its scalability on heterogeneous networks.
本文主要研究可扩展的视频索引问题。我们提出了一种结合稀疏空间多尺度补丁和图像组(Group of Pictures, GoP)运动补丁的新框架。通过在非参数框架中使用第k近邻(kNN)估计器估计的Kullback-Leibler散度来比较这些补丁集的分布。我们在ICOS-HD ANR项目中选定的视频上评估了这种相似性度量,特别探讨了它对重采样和压缩的鲁棒性,从而显示了它在异构网络上的可扩展性。
{"title":"Scalable Spatio-Temporal Video Indexing Using Sparse Multiscale Patches","authors":"Paolo Piro, S. Anthoine, E. Debreuve, M. Barlaud","doi":"10.1109/CBMI.2009.48","DOIUrl":"https://doi.org/10.1109/CBMI.2009.48","url":null,"abstract":"In this paper we address the problem of scalable video indexing. We propose a new framework combining sparse spatial multiscale patches and Group of Pictures (GoP) motion patches. The distributions of these sets of patches are compared via the Kullback-Leibler divergence estimated in a non-parametric framework using a k-th Nearest Neighbor (kNN) estimator. We evaluated this similarity measure on selected videos from the ICOS-HD ANR project, probing in particular its robustness to resampling and compression and thus showing its scalability on heterogeneous networks.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134254845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As the Web becomes a platform for multimedia content fruition, audiovisual search assumes a central role in providing users with the content most adequate to their information needs. A key issue for enabling audiovisual search is extracting indexable knowledge from opaque media. Such a process is heavily constrained by scalability and performance issues and must be able to flexibly incorporate specialized components for educing selected features from media elements. This paper shows how the use of a model-driven approach can help designers specify multimedia indexing processes, verify properties of interest in such processes, and generate the code that orchestrates the components, so as to enable rapid prototyping of content analysis processes in presence of evolving requirements.
{"title":"Model-Driven Design of Audiovisual Indexing Processes for Search-Based Applications","authors":"P. Fraternali, M. Brambilla, A. Bozzon","doi":"10.1109/CBMI.2009.51","DOIUrl":"https://doi.org/10.1109/CBMI.2009.51","url":null,"abstract":"As the Web becomes a platform for multimedia content fruition, audiovisual search assumes a central role in providing users with the content most adequate to their information needs. A key issue for enabling audiovisual search is extracting indexable knowledge from opaque media. Such a process is heavily constrained by scalability and performance issues and must be able to flexibly incorporate specialized components for educing selected features from media elements. This paper shows how the use of a model-driven approach can help designers specify multimedia indexing processes, verify properties of interest in such processes, and generate the code that orchestrates the components, so as to enable rapid prototyping of content analysis processes in presence of evolving requirements.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130908317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SVM is one of the state-of-the-art techniques for image and video classification. When multiple kernels are available, the recently introduced multiple kernel SVM (MK-SVM) learns an optimal linear combination of the kernels, providing a new method for information fusion. In this paper we study how the behaviour of MK-SVM is affected by the norm used to regularise the kernel weights to be learnt. Through experiments on three image/video classification datasets as well as on synthesised data, new insights are gained as to how the choice of regularisation norm should be made, especially when MK-SVM is applied to image/video classification problems.
{"title":"A Comparison of L_1 Norm and L_2 Norm Multiple Kernel SVMs in Image and Video Classification","authors":"F. Yan, K. Mikolajczyk, J. Kittler, M. Tahir","doi":"10.1109/CBMI.2009.44","DOIUrl":"https://doi.org/10.1109/CBMI.2009.44","url":null,"abstract":"SVM is one of the state-of-the-art techniques for image and video classification. When multiple kernels are available, the recently introduced multiple kernel SVM (MK-SVM) learns an optimal linear combination of the kernels, providing a new method for information fusion. In this paper we study how the behaviour of MK-SVM is affected by the norm used to regularise the kernel weights to be learnt. Through experiments on three image/video classification datasets as well as on synthesised data, new insights are gained as to how the choice of regularisation norm should be made, especially when MK-SVM is applied to image/video classification problems.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116545853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Nikolopoulos, Christina Lakka, Y. Kompatsiaris, Christos Varytimidis, Konstantinos Rapantzikos, Yannis Avrithis
In this paper a cross media analysis scheme for the semantic interpretation of compound documents is presented. It is essentially a late-fusion mechanism that operates on top of single-media extractors output and it’s main novelty relies on using the evidence extracted from heterogeneous media sources to perform probabilistic inference on a bayesian network that incorporates knowledge about the domain. Experiments performed on a set of 54 compound documents showed that the proposed scheme is able to exploit the existing cross media relations and achieve performance improvements.
{"title":"Compound Document Analysis by Fusing Evidence Across Media","authors":"S. Nikolopoulos, Christina Lakka, Y. Kompatsiaris, Christos Varytimidis, Konstantinos Rapantzikos, Yannis Avrithis","doi":"10.1109/CBMI.2009.35","DOIUrl":"https://doi.org/10.1109/CBMI.2009.35","url":null,"abstract":"In this paper a cross media analysis scheme for the semantic interpretation of compound documents is presented. It is essentially a late-fusion mechanism that operates on top of single-media extractors output and it’s main novelty relies on using the evidence extracted from heterogeneous media sources to perform probabilistic inference on a bayesian network that incorporates knowledge about the domain. Experiments performed on a set of 54 compound documents showed that the proposed scheme is able to exploit the existing cross media relations and achieve performance improvements.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126380991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Semantic scene classification is a challenging research problem that aims to categorise images into semantic classes such as beaches, sunsets or mountains. This problem can be formulated as multi-labeled classification problem where an image can belong to more than one conceptual class such as sunsets and beaches at the same time. Recently, Kernel Discriminant Analysis combined with spectral regression (SR-KDA) has been successfully used for face, text and spoken letter recognition. But SR-KDA method works only with positive definite symmetric matrices. In this paper, we have modified this method to support both definite and indefinite symmetric matrices. The main idea is to use LDLT decomposition instead of Cholesky decomposition. The modified SR-KDA is applied to scene database involving 6 concepts. We validate the advocated approach and demonstrate that it yields significant performance gains when conditionally positive definite triangular kernel is used instead of positive definite symmetric kernels such as linear, polynomial or RBF. The results also indicate performance gains when compared with the state-of-the art multi-label methods for semantic scene classification.
{"title":"Kernel Discriminant Analysis Using Triangular Kernel for Semantic Scene Classification","authors":"M. Tahir, J. Kittler, F. Yan, K. Mikolajczyk","doi":"10.1109/CBMI.2009.47","DOIUrl":"https://doi.org/10.1109/CBMI.2009.47","url":null,"abstract":"Semantic scene classification is a challenging research problem that aims to categorise images into semantic classes such as beaches, sunsets or mountains. This problem can be formulated as multi-labeled classification problem where an image can belong to more than one conceptual class such as sunsets and beaches at the same time. Recently, Kernel Discriminant Analysis combined with spectral regression (SR-KDA) has been successfully used for face, text and spoken letter recognition. But SR-KDA method works only with positive definite symmetric matrices. In this paper, we have modified this method to support both definite and indefinite symmetric matrices. The main idea is to use LDLT decomposition instead of Cholesky decomposition. The modified SR-KDA is applied to scene database involving 6 concepts. We validate the advocated approach and demonstrate that it yields significant performance gains when conditionally positive definite triangular kernel is used instead of positive definite symmetric kernels such as linear, polynomial or RBF. The results also indicate performance gains when compared with the state-of-the art multi-label methods for semantic scene classification.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117118131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sergio Benini, Luca Canini, P. Migliorati, R. Leonardi
In video content analysis, growing research effort aims at characterising a specific type of unedited content, called rushes. This raw material, used by broadcasters and film studios for editing video programmes, usually lies un-annotated in a huge database. In this work we aim at retrieving a desired type of rush by representing the whole database content in a multimodal space. Each rush content is mapped into a trajectory whose coordinates are connected to multimodal features and filming techniques used by cameramen while shooting. The trajectory evolution over time provides a strong characterisation of the video, so that different types of rushes are located into different regions of the multimodal space. The ability of the proposed method has been tested by retrieving similar rushes from a large database provided by EiTB, the Basque Country main broadcaster.
{"title":"Multimodal Space for Rushes Representation and Retrieval","authors":"Sergio Benini, Luca Canini, P. Migliorati, R. Leonardi","doi":"10.1109/CBMI.2009.28","DOIUrl":"https://doi.org/10.1109/CBMI.2009.28","url":null,"abstract":"In video content analysis, growing research effort aims at characterising a specific type of unedited content, called rushes. This raw material, used by broadcasters and film studios for editing video programmes, usually lies un-annotated in a huge database. In this work we aim at retrieving a desired type of rush by representing the whole database content in a multimodal space. Each rush content is mapped into a trajectory whose coordinates are connected to multimodal features and filming techniques used by cameramen while shooting. The trajectory evolution over time provides a strong characterisation of the video, so that different types of rushes are located into different regions of the multimodal space. The ability of the proposed method has been tested by retrieving similar rushes from a large database provided by EiTB, the Basque Country main broadcaster.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134639053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}