Anh-Phuong Ta, Christian Wolf, G. Lavoué, A. Baskurt
In this paper we present a new approach to detect and recognize 3D models in 2D storyboards which have been drawn during the production process of animated cartoons. Our method is robust to occlusion, scale and rotation. The lack of texture and color makes it difficult to extract local features of the target object from the sketched storyboard. Therefore the existing approaches using local descriptors like interest points can fail in such images. We propose a new framework which combines patch-based Zernike descriptors with a method enforcing spatial constraints for exactly detecting 3D models represented as a set of 2D views in the storyboards. Experimental results show that the proposed method can deal with partial object occlusion and is suitable for poorly textured objects.
{"title":"3D Object Detection and Viewpoint Selection in Sketch Images Using Local Patch-Based Zernike Moments","authors":"Anh-Phuong Ta, Christian Wolf, G. Lavoué, A. Baskurt","doi":"10.1109/CBMI.2009.29","DOIUrl":"https://doi.org/10.1109/CBMI.2009.29","url":null,"abstract":"In this paper we present a new approach to detect and recognize 3D models in 2D storyboards which have been drawn during the production process of animated cartoons. Our method is robust to occlusion, scale and rotation. The lack of texture and color makes it difficult to extract local features of the target object from the sketched storyboard. Therefore the existing approaches using local descriptors like interest points can fail in such images. We propose a new framework which combines patch-based Zernike descriptors with a method enforcing spatial constraints for exactly detecting 3D models represented as a set of 2D views in the storyboards. Experimental results show that the proposed method can deal with partial object occlusion and is suitable for poorly textured objects.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126985557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Saracoglu, E. Esen, Tugrul K. Ates, Banu Oskay Acar, Ünal Zubari, Ezgi C. Ozan, Egemen Özalp, Aydin Alatan, T. Çiloglu
Content Based Copy Detection (CBCD) emerges as a viable choice against active detection methodology of watermarking. The very first reason is that the media already under circulation cannot be marked and secondly, CBCD inherently can endure various severe attacks, which watermarking cannot. Although in general, media content is handled independently as visual and audio in this work both information sources are utilized in a unified framework, in which coarse representation of fundamental features are employed. From the copy detection perspective, number of attacks on audio content is limited with respect to visual case. Therefore audio, if present, is an indispensable part of a robust video copy detection system. In this study, the validity of this statement is presented through various experiments on a large data set.
{"title":"Content Based Copy Detection with Coarse Audio-Visual Fingerprints","authors":"A. Saracoglu, E. Esen, Tugrul K. Ates, Banu Oskay Acar, Ünal Zubari, Ezgi C. Ozan, Egemen Özalp, Aydin Alatan, T. Çiloglu","doi":"10.1109/CBMI.2009.12","DOIUrl":"https://doi.org/10.1109/CBMI.2009.12","url":null,"abstract":"Content Based Copy Detection (CBCD) emerges as a viable choice against active detection methodology of watermarking. The very first reason is that the media already under circulation cannot be marked and secondly, CBCD inherently can endure various severe attacks, which watermarking cannot. Although in general, media content is handled independently as visual and audio in this work both information sources are utilized in a unified framework, in which coarse representation of fundamental features are employed. From the copy detection perspective, number of attacks on audio content is limited with respect to visual case. Therefore audio, if present, is an indispensable part of a robust video copy detection system. In this study, the validity of this statement is presented through various experiments on a large data set.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134600081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A new approach for interactive video browsing is described. The novelty of the proposed approach is the flexible concept of interactive navigation summaries. Similar to time sliders, commonly used with standard soft video players, navigation summaries allow random access to a video. In addition, they also provide abstract visualizations of the content at a user-defined level of detail and, thus, quickly communicate content characteristics to the user. Navigation summaries can provide visual information about both low-level features but even high-level features. The concept fully integrates the user, who knows best which navigation summary at which level of detail could be most beneficial for his/her current video browsing task, and provide him/her a flexible set of navigation means. A first user study has shown that our approach can significantly outperform standard soft video players - the state-of-the art "poor man's" video browsing tool.
{"title":"Video Browsing Using Interactive Navigation Summaries","authors":"Klaus Schöffmann, L. Böszörményi","doi":"10.1109/CBMI.2009.40","DOIUrl":"https://doi.org/10.1109/CBMI.2009.40","url":null,"abstract":"A new approach for interactive video browsing is described. The novelty of the proposed approach is the flexible concept of interactive navigation summaries. Similar to time sliders, commonly used with standard soft video players, navigation summaries allow random access to a video. In addition, they also provide abstract visualizations of the content at a user-defined level of detail and, thus, quickly communicate content characteristics to the user. Navigation summaries can provide visual information about both low-level features but even high-level features. The concept fully integrates the user, who knows best which navigation summary at which level of detail could be most beneficial for his/her current video browsing task, and provide him/her a flexible set of navigation means. A first user study has shown that our approach can significantly outperform standard soft video players - the state-of-the art \"poor man's\" video browsing tool.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"28 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127560141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The scope of this special session is to cover all aspects that relate to the indexing and retrieval of images and video content dealing with scalability issue (especially in the context of JPEG2000 and MPEG-4 AVC/H.264 coding). The papers from the session are briefly summarized.
{"title":"Special Session: Scalable Video Indexing","authors":"Ewa Kijak, J. Benois-Pineau","doi":"10.1109/CBMI.2009.55","DOIUrl":"https://doi.org/10.1109/CBMI.2009.55","url":null,"abstract":"The scope of this special session is to cover all aspects that relate to the indexing and retrieval of images and video content dealing with scalability issue (especially in the context of JPEG2000 and MPEG-4 AVC/H.264 coding). The papers from the session are briefly summarized.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123900065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose a novel method inspired by the bio-informatics domain to parse a rushes video into scenes and takes. The Smith-Waterman algorithm provides an efficient way to compare sequences by comparing segments of all possible lengths and optimizing the similarity measure. We propose to adapt this method in order to detect repetitive sequences in rushes video. Based on the alignments found, we can parse the video into scenes and takes. By comparing takes together, we can select the most complete take in each scene. This method is evaluated on several rushes videos from the TRECVID BBC Rushes Summarization campaign.
在本文中,我们提出了一种受生物信息学领域启发的新方法,将一个匆忙的视频解析成场景和镜头。Smith-Waterman算法通过比较所有可能长度的片段并优化相似性度量,提供了一种有效的方法来比较序列。我们建议将这种方法应用于检测灯芯草视频中的重复序列。根据找到的排列,我们可以把视频解析成场景和镜头。通过比较,我们可以在每个场景中选择最完整的镜头。该方法在来自TRECVID BBC rush summary活动的几个rush视频中进行了评估。
{"title":"Rushes Video Parsing Using Video Sequence Alignment","authors":"Emilie Dumont, B. Mérialdo","doi":"10.1109/CBMI.2009.49","DOIUrl":"https://doi.org/10.1109/CBMI.2009.49","url":null,"abstract":"In this paper, we propose a novel method inspired by the bio-informatics domain to parse a rushes video into scenes and takes. The Smith-Waterman algorithm provides an efficient way to compare sequences by comparing segments of all possible lengths and optimizing the similarity measure. We propose to adapt this method in order to detect repetitive sequences in rushes video. Based on the alignments found, we can parse the video into scenes and takes. By comparing takes together, we can select the most complete take in each scene. This method is evaluated on several rushes videos from the TRECVID BBC Rushes Summarization campaign.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130505644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a novel semi-automatic tool for content retrieval. A multi-dimension Binary Partition Tree (BPT) is generated to perform object based image retrieval. The tree is colour based but has the advantage of incorporating spatial frequency to form semantically meaningful tree nodes. For retrieval, a node of the query image is matched against the nodes of the BPT of the database image. These are matched according to a combination of colour histograms, texture features and edge histograms. This semi-automatic tool allows users to have more freedom in their choice of query. The paper illustrates how the use of multi-dimensional information can significantly enhance content retrieval results for natural images.
{"title":"Semi-automatic BPT for Image Retrieval","authors":"Shirin Ghanbari, J. Woods, S. Lucas","doi":"10.1109/CBMI.2009.17","DOIUrl":"https://doi.org/10.1109/CBMI.2009.17","url":null,"abstract":"This paper presents a novel semi-automatic tool for content retrieval. A multi-dimension Binary Partition Tree (BPT) is generated to perform object based image retrieval. The tree is colour based but has the advantage of incorporating spatial frequency to form semantically meaningful tree nodes. For retrieval, a node of the query image is matched against the nodes of the BPT of the database image. These are matched according to a combination of colour histograms, texture features and edge histograms. This semi-automatic tool allows users to have more freedom in their choice of query. The paper illustrates how the use of multi-dimensional information can significantly enhance content retrieval results for natural images.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132249679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Worldwide, the volume of stored information is growing exponentially, and an increasing share is audiovisual content. This content drives the demand for new services, making audiovisual search one of the major challenges for organisations and businesses today. Digital data is the greatest value that many organisations possess, and the ability to use it, rather than just store it, will be one of the most important strategic aspects in the coming decade. In this scenario, several research efforts focused on studying advanced search architectures for enabling consumers, businesses, and organisations to unlock the values found in audiovisual content through innovative access paradigms. In particular, the focus of current researches is on managing and enabling access to information sources of all types, supporting advanced audiovisual processing and content handling that will enhance control, creation, and sharing of multimedia for all users in the value chain. Several research projects financed by the European Commission tackle this problem from different perspectives and provide diverse visions for the future. This will impact in the mid-long term on audiovisual industry, allowing companies to provide more effective and efficient access to contents thanks to innovative annotation techniques and search paradigms. The aim of this CBMI special session, organized with the support of the PHAROS project 1 , is to offer an overview of the research initiatives at European level that address the problems related to processing, annotation, indexing, and provisioning of contents within search applications. The session includes nine peer-reviewed contributions, reported in this volume, and an invited speech. The invited speech, given by professor Stefano Ceri, from Politecnico di Milano, Italy, will deliver a visionary discussion on the topic of Search Computing, a novel multi-disciplinary science which will provide the abstractions, foundations, methods, and tools required to answer cross-domain search queries, that cannot be addressed by 1 PHAROS (Platform for searcHing of Audiovisual Resources across Online Spaces) Integrated Project (IST-2005-2.6.3) financed by the EC IST 6th Framework. current search engines. A typical example of multi-domain query is “Where can I attend an interesting Information Retrieval conference close to a sunny beach, with direct flight connection to Europe and having a nice and cheap hotel accomodation?”. The generality of the problem makes it extremely relevant for the information retrieval community and poses additional challenges to the field of multimedia content annotation and indexing. The Search Computing project is currently financed by the ERC under the IDEAS Advanced Grants programme. The other contributions to the session include a work by Daras and Axenopoulos, that present a novel view-based approach for 3D object retrieval, that exploits automatic generation of a set of 2D images from a 3D object for calculating a glo
{"title":"Special Session: Multimedia Indexing for Content Based Search","authors":"M. Brambilla, F. Nucci","doi":"10.1109/CBMI.2009.54","DOIUrl":"https://doi.org/10.1109/CBMI.2009.54","url":null,"abstract":"Worldwide, the volume of stored information is growing exponentially, and an increasing share is audiovisual content. This content drives the demand for new services, making audiovisual search one of the major challenges for organisations and businesses today. Digital data is the greatest value that many organisations possess, and the ability to use it, rather than just store it, will be one of the most important strategic aspects in the coming decade. In this scenario, several research efforts focused on studying advanced search architectures for enabling consumers, businesses, and organisations to unlock the values found in audiovisual content through innovative access paradigms. In particular, the focus of current researches is on managing and enabling access to information sources of all types, supporting advanced audiovisual processing and content handling that will enhance control, creation, and sharing of multimedia for all users in the value chain. Several research projects financed by the European Commission tackle this problem from different perspectives and provide diverse visions for the future. This will impact in the mid-long term on audiovisual industry, allowing companies to provide more effective and efficient access to contents thanks to innovative annotation techniques and search paradigms. The aim of this CBMI special session, organized with the support of the PHAROS project 1 , is to offer an overview of the research initiatives at European level that address the problems related to processing, annotation, indexing, and provisioning of contents within search applications. The session includes nine peer-reviewed contributions, reported in this volume, and an invited speech. The invited speech, given by professor Stefano Ceri, from Politecnico di Milano, Italy, will deliver a visionary discussion on the topic of Search Computing, a novel multi-disciplinary science which will provide the abstractions, foundations, methods, and tools required to answer cross-domain search queries, that cannot be addressed by 1 PHAROS (Platform for searcHing of Audiovisual Resources across Online Spaces) Integrated Project (IST-2005-2.6.3) financed by the EC IST 6th Framework. current search engines. A typical example of multi-domain query is “Where can I attend an interesting Information Retrieval conference close to a sunny beach, with direct flight connection to Europe and having a nice and cheap hotel accomodation?”. The generality of the problem makes it extremely relevant for the information retrieval community and poses additional challenges to the field of multimedia content annotation and indexing. The Search Computing project is currently financed by the ERC under the IDEAS Advanced Grants programme. The other contributions to the session include a work by Daras and Axenopoulos, that present a novel view-based approach for 3D object retrieval, that exploits automatic generation of a set of 2D images from a 3D object for calculating a glo","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133155485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Action recognition is a crucial task to provide high-level semantic description of the video content, particularly in the case of sports videos. The bag-of-words (BoW) approach has proven to be successful for the categorization of objects and scenes in images, but it's unable to model temporal information between consecutive frames for video event recognition. In this paper, we present an approach to model actions as a sequence of histograms (one for each frame) represented using a traditional bag-of-words model. Actions are so described by a string (phrase) of variable size, depending on the clip's length, where each frame's representation is considered as a character. To compare these strings we use Needlemann-Wunsch distance, a metrics defined in the information theory, that deal with strings of different length. Finally, SVMs with a string kernel that includes this distance are used to perform classification. Experimental results demonstrate the validity of the proposed approach and they show that it outperforms baseline kNN classifiers.
{"title":"Action Categorization in Soccer Videos Using String Kernels","authors":"Lamberto Ballan, M. Bertini, A. Bimbo, G. Serra","doi":"10.1109/CBMI.2009.10","DOIUrl":"https://doi.org/10.1109/CBMI.2009.10","url":null,"abstract":"Action recognition is a crucial task to provide high-level semantic description of the video content, particularly in the case of sports videos. The bag-of-words (BoW) approach has proven to be successful for the categorization of objects and scenes in images, but it's unable to model temporal information between consecutive frames for video event recognition. In this paper, we present an approach to model actions as a sequence of histograms (one for each frame) represented using a traditional bag-of-words model. Actions are so described by a string (phrase) of variable size, depending on the clip's length, where each frame's representation is considered as a character. To compare these strings we use Needlemann-Wunsch distance, a metrics defined in the information theory, that deal with strings of different length. Finally, SVMs with a string kernel that includes this distance are used to perform classification. Experimental results demonstrate the validity of the proposed approach and they show that it outperforms baseline kNN classifiers.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130631345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Color is the major source of information widely used in image analysis and content-based retrieval. Extracting dominant colors that are prominent in a visual scenery is of utter importance since human visual system primarily uses them for perception. In this paper we address dominant color extraction as a dynamic clustering problem and use techniques based on Particle Swarm Optimization (PSO) for finding optimal (number of) dominant colors in a given color space, distance metric and a proper validity index function. The first technique, so-called Multi-Dimensional (MD) PSO, re-forms the native structure of swarm particles in such a way that they can make inter-dimensional passes with a dedicated dimensional PSO process. Therefore, in a multidimensional search space where the optimum dimension is unknown, swarm particles can seek both positional and dimensional optima. Nevertheless, MD PSO is still susceptible to premature convergences due to lack of divergence. To address this problem we then present Fractional Global Best Formation (FGBF) technique, which basically collects all promising dimensional components and fractionally creates an artificial global-best particle (aGB) that has the potential to be a better “guide” than the PSO’s native gbest particle. We finally propose an efficient color distance metric, which uses a fuzzy model for computing color (dis-) similarities over HSV (or HSL) color space. The comparative evaluations against MPEG-7 dominant color descriptor show the superiority of the proposed technique.
{"title":"Dominant Color Extraction Based on Dynamic Clustering by Multi-dimensional Particle Swarm Optimization","authors":"S. Kiranyaz, Stefan Uhlmann, M. Gabbouj","doi":"10.1109/CBMI.2009.11","DOIUrl":"https://doi.org/10.1109/CBMI.2009.11","url":null,"abstract":"Color is the major source of information widely used in image analysis and content-based retrieval. Extracting dominant colors that are prominent in a visual scenery is of utter importance since human visual system primarily uses them for perception. In this paper we address dominant color extraction as a dynamic clustering problem and use techniques based on Particle Swarm Optimization (PSO) for finding optimal (number of) dominant colors in a given color space, distance metric and a proper validity index function. The first technique, so-called Multi-Dimensional (MD) PSO, re-forms the native structure of swarm particles in such a way that they can make inter-dimensional passes with a dedicated dimensional PSO process. Therefore, in a multidimensional search space where the optimum dimension is unknown, swarm particles can seek both positional and dimensional optima. Nevertheless, MD PSO is still susceptible to premature convergences due to lack of divergence. To address this problem we then present Fractional Global Best Formation (FGBF) technique, which basically collects all promising dimensional components and fractionally creates an artificial global-best particle (aGB) that has the potential to be a better “guide” than the PSO’s native gbest particle. We finally propose an efficient color distance metric, which uses a fuzzy model for computing color (dis-) similarities over HSV (or HSL) color space. The comparative evaluations against MPEG-7 dominant color descriptor show the superiority of the proposed technique.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132838276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Dimou, Grigorios Tsoumakas, V. Mezaris, Y. Kompatsiaris, I. Vlahavas
This paper presents an experimental comparison of different approaches to learning from multi-labeled video data. We compare state-of-the-art multi-label learning methods on the Media mill Challenge dataset. We employ MPEG-7 and SIFT-based global image descriptors independently and in conjunction using variations of the stacking approach for their fusion. We evaluate the results comparing the different classifiers using both MPEG-7 and SIFT-based descriptors and their fusion. A variety of multi-label evaluation measures is used to explore advantages and disadvantages of the examined classifiers. Results give rise to interesting conclusions.
{"title":"An Empirical Study of Multi-label Learning Methods for Video Annotation","authors":"A. Dimou, Grigorios Tsoumakas, V. Mezaris, Y. Kompatsiaris, I. Vlahavas","doi":"10.1109/CBMI.2009.37","DOIUrl":"https://doi.org/10.1109/CBMI.2009.37","url":null,"abstract":"This paper presents an experimental comparison of different approaches to learning from multi-labeled video data. We compare state-of-the-art multi-label learning methods on the Media mill Challenge dataset. We employ MPEG-7 and SIFT-based global image descriptors independently and in conjunction using variations of the stacking approach for their fusion. We evaluate the results comparing the different classifiers using both MPEG-7 and SIFT-based descriptors and their fusion. A variety of multi-label evaluation measures is used to explore advantages and disadvantages of the examined classifiers. Results give rise to interesting conclusions.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115947425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}