This paper focuses on the query-focused video summarization, which is an extended task of video summarization and aims to automatically generate user-oriented summary by highlighting frames/shots relevant to the query. This task is different from traditional video summarization in paying attention to users' subjectivity through queries. Diversity is a recognized important property in video summarization. However, existing methods only consider diversity as the dissimilarity between frames/shots which is far from user-oriented summarization. Users' different understandings of video should be an important source of diversity, reflected in the process of eliminating query-unrelated redundancy. To this end, this paper explores user-diversified & query-focused video summarization via a well-devised hierarchical variational network called HVN. HVN has three distinctive characteristics: (i) it has a hierarchical structure to model query-related long-range temporal dependency; (ii) it employs diverse attention mechanisms to encode query-related and context-important information and makes them balanced; (iii) it employs a multilevel self-attention module and a variational autoencoder module to add user-oriented diversity and stochastic factors. Experimental results demonstrate that HVN not only outperforms the state-of-the-arts but also improves the user-oriented diversity to some extent.
{"title":"Hierarchical Variational Network for User-Diversified & Query-Focused Video Summarization","authors":"Pin Jiang, Yahong Han","doi":"10.1145/3323873.3325040","DOIUrl":"https://doi.org/10.1145/3323873.3325040","url":null,"abstract":"This paper focuses on the query-focused video summarization, which is an extended task of video summarization and aims to automatically generate user-oriented summary by highlighting frames/shots relevant to the query. This task is different from traditional video summarization in paying attention to users' subjectivity through queries. Diversity is a recognized important property in video summarization. However, existing methods only consider diversity as the dissimilarity between frames/shots which is far from user-oriented summarization. Users' different understandings of video should be an important source of diversity, reflected in the process of eliminating query-unrelated redundancy. To this end, this paper explores user-diversified & query-focused video summarization via a well-devised hierarchical variational network called HVN. HVN has three distinctive characteristics: (i) it has a hierarchical structure to model query-related long-range temporal dependency; (ii) it employs diverse attention mechanisms to encode query-related and context-important information and makes them balanced; (iii) it employs a multilevel self-attention module and a variational autoencoder module to add user-oriented diversity and stochastic factors. Experimental results demonstrate that HVN not only outperforms the state-of-the-arts but also improves the user-oriented diversity to some extent.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132694053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cross-database micro-expression recognition (CDMER) is one of recently emerging and interesting problems in micro-expression analysis. CDMER is more challenging than the conventional micro-expression recognition (MER), because the training and testing samples in CDMER come from different micro-expression databases, resulting in inconsistency of the feature distributions between the training and testing sets. In this paper, we contribute to this topic from two aspects. First, we establish a CDMER experimental evaluation protocol and provide a standard platform for evaluating their proposed methods. Second, we conduct extensive benchmark experiments by using NINE state-of-the-art domain adaptation (DA) methods and SIX popular spatiotemporal descriptors for investigating the CDMER problem from two different perspectives and deeply analyze and discuss the experimental results. In addition, all the data and codes involving CDMER in this paper are released on our project website: http://aip.seu.edu.cn/cdmer.
{"title":"Cross-Database Micro-Expression Recognition: A Benchmark","authors":"Yuan Zong, Wenming Zheng, Xiaopeng Hong, Chuangao Tang, Zhen Cui, Guoying Zhao","doi":"10.1145/3323873.3326590","DOIUrl":"https://doi.org/10.1145/3323873.3326590","url":null,"abstract":"Cross-database micro-expression recognition (CDMER) is one of recently emerging and interesting problems in micro-expression analysis. CDMER is more challenging than the conventional micro-expression recognition (MER), because the training and testing samples in CDMER come from different micro-expression databases, resulting in inconsistency of the feature distributions between the training and testing sets. In this paper, we contribute to this topic from two aspects. First, we establish a CDMER experimental evaluation protocol and provide a standard platform for evaluating their proposed methods. Second, we conduct extensive benchmark experiments by using NINE state-of-the-art domain adaptation (DA) methods and SIX popular spatiotemporal descriptors for investigating the CDMER problem from two different perspectives and deeply analyze and discuss the experimental results. In addition, all the data and codes involving CDMER in this paper are released on our project website: http://aip.seu.edu.cn/cdmer.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134517612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Despite their impressive success when these hyper-parameters are suitably fine-tuned, the design of good network architectures remains an art-form rather than a science: while various search techniques, such as grid-search, have been proposed to find effective hyper-parameter configurations, often these parameters are hand-crafted (or the bounds of the search space are provided by a user). In this paper, we argue, and experimentally show, that we can minimize the need for hand-crafting, by relying on the dataset itself. In particular, we show that the dimensions, distributions, and complexities of localized features extracted from the data can inform the structure of the neural networks and help better allocate limited resources (such as kernels) to the various layers of the network. To achieve this, we first present several hypotheses that link the properties of the localized image features to the CNN and RCNN architectures and then, relying on these hypotheses, present a RACKNet framework which aims to learn multiple hyper-parameters by extracting information encoded in the input datasets. Experimental evaluations of RACKNet against major benchmark datasets, such as MNIST, SVHN, CIFAR10, COIL20 and ImageNet, show that RACKNet provides significant improvements in the network design and robustness to change in the network.
{"title":"RACKNet","authors":"Yash Garg, K. Candan","doi":"10.1145/3323873.3325057","DOIUrl":"https://doi.org/10.1145/3323873.3325057","url":null,"abstract":"Despite their impressive success when these hyper-parameters are suitably fine-tuned, the design of good network architectures remains an art-form rather than a science: while various search techniques, such as grid-search, have been proposed to find effective hyper-parameter configurations, often these parameters are hand-crafted (or the bounds of the search space are provided by a user). In this paper, we argue, and experimentally show, that we can minimize the need for hand-crafting, by relying on the dataset itself. In particular, we show that the dimensions, distributions, and complexities of localized features extracted from the data can inform the structure of the neural networks and help better allocate limited resources (such as kernels) to the various layers of the network. To achieve this, we first present several hypotheses that link the properties of the localized image features to the CNN and RCNN architectures and then, relying on these hypotheses, present a RACKNet framework which aims to learn multiple hyper-parameters by extracting information encoded in the input datasets. Experimental evaluations of RACKNet against major benchmark datasets, such as MNIST, SVHN, CIFAR10, COIL20 and ImageNet, show that RACKNet provides significant improvements in the network design and robustness to change in the network.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115613744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Steven C. Y. Hung, Jia-Hong Lee, Timmy S. T. Wan, Chien-Hung Chen, Yi-Ming Chan, Chu-Song Chen
Simultaneously running multiple modules is a key requirement for a smart multimedia system for facial applications including face recognition, facial expression understanding, and gender identification. To effectively integrate them, a continual learning approach to learn new tasks without forgetting is introduced. Unlike previous methods growing monotonically in size, our approach maintains the compactness in continual learning. The proposed packing-and-expanding method is effective and easy to implement, which can iteratively shrink and enlarge the model to integrate new functions. Our integrated multitask model can achieve similar accuracy with only 39.9% of the original size.
{"title":"Increasingly Packing Multiple Facial-Informatics Modules in A Unified Deep-Learning Model via Lifelong Learning","authors":"Steven C. Y. Hung, Jia-Hong Lee, Timmy S. T. Wan, Chien-Hung Chen, Yi-Ming Chan, Chu-Song Chen","doi":"10.1145/3323873.3325053","DOIUrl":"https://doi.org/10.1145/3323873.3325053","url":null,"abstract":"Simultaneously running multiple modules is a key requirement for a smart multimedia system for facial applications including face recognition, facial expression understanding, and gender identification. To effectively integrate them, a continual learning approach to learn new tasks without forgetting is introduced. Unlike previous methods growing monotonically in size, our approach maintains the compactness in continual learning. The proposed packing-and-expanding method is effective and easy to implement, which can iteratively shrink and enlarge the model to integrate new functions. Our integrated multitask model can achieve similar accuracy with only 39.9% of the original size.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"59 24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115763193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motion capture technologies digitize human movements by tracking 3D positions of specific skeleton joints in time. Such spatio-temporal multimedia data have an enormous application potential in many fields, ranging from computer animation, through security and sports to medicine, but their computerized processing is a difficult problem. In this paper, we focus on an important task of recognition of a user-defined motion, based on a collection of labelled actions known in advance. We utilize current advances in deep feature learning and scalable similarity retrieval to build an effective and efficient k-nearest-neighbor recognition technique for 3D human motion data. The properties of the technique are demonstrated by a web application which allows a user to browse long motion sequences and specify any subsequence as the input for probabilistic recognition based on 130 predefined classes.
{"title":"Recognizing User-Defined Subsequences in Human Motion Data","authors":"J. Sedmidubský, P. Zezula","doi":"10.1145/3323873.3326922","DOIUrl":"https://doi.org/10.1145/3323873.3326922","url":null,"abstract":"Motion capture technologies digitize human movements by tracking 3D positions of specific skeleton joints in time. Such spatio-temporal multimedia data have an enormous application potential in many fields, ranging from computer animation, through security and sports to medicine, but their computerized processing is a difficult problem. In this paper, we focus on an important task of recognition of a user-defined motion, based on a collection of labelled actions known in advance. We utilize current advances in deep feature learning and scalable similarity retrieval to build an effective and efficient k-nearest-neighbor recognition technique for 3D human motion data. The properties of the technique are demonstrated by a web application which allows a user to browse long motion sequences and specify any subsequence as the input for probabilistic recognition based on 130 predefined classes.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115775415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Distributing multimedia indexes to multiple nodes enables search over very large datasets (i.e., over one billion images and videos), but comes with a set of challenges: textithow to distribute documents and queries effectively across nodes to support concurrent querying? andhow to deal with the increased potential for lack of response from nodes (e.g., node fail-stops or dropping of network packages)? An index where partitions are based on the distribution of feature vectors in the original space can improve redundancy and increase efficiency: nearest neighbors are only present on a small, set number of partitions, reducing the number of nodes to inspect for each query. This paper describes how sparse hashes can help find this balance and create better distribution policies for high-dimensional feature vectors. Inspired by existing literature on distributed text and media indexes, our proposal distributes and balances documents and queries to a subset of the nodes, according to their orthogonal similarities. We performed exhaustive benchmarks of our approach on a commercial cloud service. Experiments on a one billion vector dataset show that our approach has a low partitioning overhead (3 to 5 ms per query), achieves balanced document and query distribution (the variation in document and query distribution across nodes is smaller than 1% and 10%, respectively), handles concurrent queries effectively and degrades gracefully with node failures (less than 2% of precision loss per node down).
{"title":"Towards Cloud Distributed Image Indexing by Sparse Hashing","authors":"André Mourão, João Magalhães","doi":"10.1145/3323873.3325046","DOIUrl":"https://doi.org/10.1145/3323873.3325046","url":null,"abstract":"Distributing multimedia indexes to multiple nodes enables search over very large datasets (i.e., over one billion images and videos), but comes with a set of challenges: textithow to distribute documents and queries effectively across nodes to support concurrent querying? andhow to deal with the increased potential for lack of response from nodes (e.g., node fail-stops or dropping of network packages)? An index where partitions are based on the distribution of feature vectors in the original space can improve redundancy and increase efficiency: nearest neighbors are only present on a small, set number of partitions, reducing the number of nodes to inspect for each query. This paper describes how sparse hashes can help find this balance and create better distribution policies for high-dimensional feature vectors. Inspired by existing literature on distributed text and media indexes, our proposal distributes and balances documents and queries to a subset of the nodes, according to their orthogonal similarities. We performed exhaustive benchmarks of our approach on a commercial cloud service. Experiments on a one billion vector dataset show that our approach has a low partitioning overhead (3 to 5 ms per query), achieves balanced document and query distribution (the variation in document and query distribution across nodes is smaller than 1% and 10%, respectively), handles concurrent queries effectively and degrades gracefully with node failures (less than 2% of precision loss per node down).","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130840048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Understanding the objects and relations between them is indispensable to fine-grained video content analysis, which is widely studied in recent research works in multimedia and computer vision. However, existing works are limited to evaluating with either small datasets or indirect metrics, such as the performance over images. The underlying reason is that the construction of a large-scale video dataset with dense annotation is tricky and costly. In this paper, we address several main issues in annotating objects and relations in user-generated videos, and propose an annotation pipeline that can be executed at a modest cost. As a result, we present a new dataset, named VidOR, consisting of 10k videos (84 hours) together with dense annotations that localize 80 categories of objects and 50 categories of predicates in each video. We have made the training and validation set public and extendable for more tasks to facilitate future research on video object and relation recognition.
{"title":"Annotating Objects and Relations in User-Generated Videos","authors":"Xindi Shang, Donglin Di, Junbin Xiao, Yu Cao, Xun Yang, Tat-Seng Chua","doi":"10.1145/3323873.3325056","DOIUrl":"https://doi.org/10.1145/3323873.3325056","url":null,"abstract":"Understanding the objects and relations between them is indispensable to fine-grained video content analysis, which is widely studied in recent research works in multimedia and computer vision. However, existing works are limited to evaluating with either small datasets or indirect metrics, such as the performance over images. The underlying reason is that the construction of a large-scale video dataset with dense annotation is tricky and costly. In this paper, we address several main issues in annotating objects and relations in user-generated videos, and propose an annotation pipeline that can be executed at a modest cost. As a result, we present a new dataset, named VidOR, consisting of 10k videos (84 hours) together with dense annotations that localize 80 categories of objects and 50 categories of predicates in each video. We have made the training and validation set public and extendable for more tasks to facilitate future research on video object and relation recognition.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"AES-19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126549335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Most existing image enhancement methods function like a black box, which cannot clearly reveal the procedure behind each image enhancement operation. To overcome this limitation, in this paper, we design a progressive image enhancement framework, which generates an expected "good" retouched image with a group of self-interpretable image filters under the guidance of an aesthetic assessment model. The introduced aesthetic network effectively alleviates the shortage of paired training samples by providing extra supervision, and eliminate the bias caused by human subjective preferences. The self-interpretable image filters designed in our image enhancement framework, make the overall image enhancing procedure easy-to-understand. Extensive experiments demonstrate the effectiveness of our proposed framework.
{"title":"Progressive Image Enhancement under Aesthetic Guidance","authors":"Xiaoyu Du, Xun Yang, Zhiguang Qin, Jinhui Tang","doi":"10.1145/3323873.3325055","DOIUrl":"https://doi.org/10.1145/3323873.3325055","url":null,"abstract":"Most existing image enhancement methods function like a black box, which cannot clearly reveal the procedure behind each image enhancement operation. To overcome this limitation, in this paper, we design a progressive image enhancement framework, which generates an expected \"good\" retouched image with a group of self-interpretable image filters under the guidance of an aesthetic assessment model. The introduced aesthetic network effectively alleviates the shortage of paired training samples by providing extra supervision, and eliminate the bias caused by human subjective preferences. The self-interpretable image filters designed in our image enhancement framework, make the overall image enhancing procedure easy-to-understand. Extensive experiments demonstrate the effectiveness of our proposed framework.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126852041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Techniques and tools designed for information retrieval, data exploration or data analytical tasks are based on the relational and text-search model, and cannot be easily applied to unstructured data such as images or videos. Researcher communities have been trying to reveal the semantics of multimedia in the last decades with ever-improving results in various tasks, dominated by the latest success of deep learning. Limits of object retrieval models drive the need for data exploration methods that support multi-modal data, like multimedia surrounded by structured attributes. In this paper, we describe, implement and evaluate exploration methods using multiple modalities and retrieval models in the context of multimedia. We apply the techniques in e-commerce product search and recommending, and demonstrate benefit for different retrieval scenarios. Lastly, we propose a method for extending database schema by latent visual attributes learned from image data. This enables closing the loop by going back to relational data, and potentially benefiting a range of industrial applications.
{"title":"Methods of Multi-Modal Data Exploration","authors":"Tomás Grosup","doi":"10.1145/3323873.3325858","DOIUrl":"https://doi.org/10.1145/3323873.3325858","url":null,"abstract":"Techniques and tools designed for information retrieval, data exploration or data analytical tasks are based on the relational and text-search model, and cannot be easily applied to unstructured data such as images or videos. Researcher communities have been trying to reveal the semantics of multimedia in the last decades with ever-improving results in various tasks, dominated by the latest success of deep learning. Limits of object retrieval models drive the need for data exploration methods that support multi-modal data, like multimedia surrounded by structured attributes. In this paper, we describe, implement and evaluate exploration methods using multiple modalities and retrieval models in the context of multimedia. We apply the techniques in e-commerce product search and recommending, and demonstrate benefit for different retrieval scenarios. Lastly, we propose a method for extending database schema by latent visual attributes learned from image data. This enables closing the loop by going back to relational data, and potentially benefiting a range of industrial applications.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134182586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Despite the major advances on feature development for low and mid-level representations, a single visual feature is often insufficient to achieve effective retrieval results in different scenarios. Since diverse visual properties provide distinct and often complementary information for a same query, the combination of different features, including handcrafted and learned features, has been establishing as a relevant trend in image retrieval. An intrinsic difficulty task consists in selecting and combining features that provide a high-effective result, which is often supported by supervised learning methods. However, in the absence of labeled data, selecting and fusing features in a completely unsupervised fashion becomes an essential, although very challenging task. The proposed genetic algorithm employs effectiveness estimation measures as fitness functions, making the evolutionary process fully unsupervised. Our approach was evaluated considering 3 public datasets and 35 different descriptors achieving relative gains up to +53.96% in scenarios with more than 8 billion possible combinations of rankers. The framework was also compared to different baselines, including state-of-the-art methods.
{"title":"An Unsupervised Genetic Algorithm Framework for Rank Selection and Fusion on Image Retrieval","authors":"Lucas Pascotti Valem, D. C. G. Pedronette","doi":"10.1145/3323873.3325022","DOIUrl":"https://doi.org/10.1145/3323873.3325022","url":null,"abstract":"Despite the major advances on feature development for low and mid-level representations, a single visual feature is often insufficient to achieve effective retrieval results in different scenarios. Since diverse visual properties provide distinct and often complementary information for a same query, the combination of different features, including handcrafted and learned features, has been establishing as a relevant trend in image retrieval. An intrinsic difficulty task consists in selecting and combining features that provide a high-effective result, which is often supported by supervised learning methods. However, in the absence of labeled data, selecting and fusing features in a completely unsupervised fashion becomes an essential, although very challenging task. The proposed genetic algorithm employs effectiveness estimation measures as fitness functions, making the evolutionary process fully unsupervised. Our approach was evaluated considering 3 public datasets and 35 different descriptors achieving relative gains up to +53.96% in scenarios with more than 8 billion possible combinations of rankers. The framework was also compared to different baselines, including state-of-the-art methods.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121208376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}