首页 > 最新文献

Proceedings of the 2019 on International Conference on Multimedia Retrieval最新文献

英文 中文
Hierarchical Variational Network for User-Diversified & Query-Focused Video Summarization 面向用户多样化和以查询为中心的视频摘要层次变分网络
Pub Date : 2019-06-05 DOI: 10.1145/3323873.3325040
Pin Jiang, Yahong Han
This paper focuses on the query-focused video summarization, which is an extended task of video summarization and aims to automatically generate user-oriented summary by highlighting frames/shots relevant to the query. This task is different from traditional video summarization in paying attention to users' subjectivity through queries. Diversity is a recognized important property in video summarization. However, existing methods only consider diversity as the dissimilarity between frames/shots which is far from user-oriented summarization. Users' different understandings of video should be an important source of diversity, reflected in the process of eliminating query-unrelated redundancy. To this end, this paper explores user-diversified & query-focused video summarization via a well-devised hierarchical variational network called HVN. HVN has three distinctive characteristics: (i) it has a hierarchical structure to model query-related long-range temporal dependency; (ii) it employs diverse attention mechanisms to encode query-related and context-important information and makes them balanced; (iii) it employs a multilevel self-attention module and a variational autoencoder module to add user-oriented diversity and stochastic factors. Experimental results demonstrate that HVN not only outperforms the state-of-the-arts but also improves the user-oriented diversity to some extent.
本文研究的是面向查询的视频摘要,它是视频摘要的扩展任务,旨在通过突出显示与查询相关的帧/镜头,自动生成面向用户的摘要。该任务与传统的视频摘要不同,通过查询关注用户的主观性。多样性是视频摘要中公认的重要属性。然而,现有的方法只将多样性视为帧/镜头之间的不相似性,这与用户导向的总结相去甚远。用户对视频的不同理解应该是多样性的重要来源,体现在消除查询无关冗余的过程中。为此,本文通过一个精心设计的称为HVN的分层变分网络探索了用户多样化和以查询为中心的视频摘要。HVN有三个显著的特点:(i)它有一个层次结构来模拟与查询相关的长期时间依赖性;(ii)采用不同的注意力机制,对与查询相关的和与上下文相关的信息进行编码,并使它们保持平衡;(3)采用多级自关注模块和变分自编码器模块,增加了面向用户的多样性和随机因素。实验结果表明,HVN在一定程度上提高了面向用户的分集性能。
{"title":"Hierarchical Variational Network for User-Diversified & Query-Focused Video Summarization","authors":"Pin Jiang, Yahong Han","doi":"10.1145/3323873.3325040","DOIUrl":"https://doi.org/10.1145/3323873.3325040","url":null,"abstract":"This paper focuses on the query-focused video summarization, which is an extended task of video summarization and aims to automatically generate user-oriented summary by highlighting frames/shots relevant to the query. This task is different from traditional video summarization in paying attention to users' subjectivity through queries. Diversity is a recognized important property in video summarization. However, existing methods only consider diversity as the dissimilarity between frames/shots which is far from user-oriented summarization. Users' different understandings of video should be an important source of diversity, reflected in the process of eliminating query-unrelated redundancy. To this end, this paper explores user-diversified & query-focused video summarization via a well-devised hierarchical variational network called HVN. HVN has three distinctive characteristics: (i) it has a hierarchical structure to model query-related long-range temporal dependency; (ii) it employs diverse attention mechanisms to encode query-related and context-important information and makes them balanced; (iii) it employs a multilevel self-attention module and a variational autoencoder module to add user-oriented diversity and stochastic factors. Experimental results demonstrate that HVN not only outperforms the state-of-the-arts but also improves the user-oriented diversity to some extent.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132694053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Cross-Database Micro-Expression Recognition: A Benchmark 跨数据库微表情识别:一个基准
Pub Date : 2019-06-05 DOI: 10.1145/3323873.3326590
Yuan Zong, Wenming Zheng, Xiaopeng Hong, Chuangao Tang, Zhen Cui, Guoying Zhao
Cross-database micro-expression recognition (CDMER) is one of recently emerging and interesting problems in micro-expression analysis. CDMER is more challenging than the conventional micro-expression recognition (MER), because the training and testing samples in CDMER come from different micro-expression databases, resulting in inconsistency of the feature distributions between the training and testing sets. In this paper, we contribute to this topic from two aspects. First, we establish a CDMER experimental evaluation protocol and provide a standard platform for evaluating their proposed methods. Second, we conduct extensive benchmark experiments by using NINE state-of-the-art domain adaptation (DA) methods and SIX popular spatiotemporal descriptors for investigating the CDMER problem from two different perspectives and deeply analyze and discuss the experimental results. In addition, all the data and codes involving CDMER in this paper are released on our project website: http://aip.seu.edu.cn/cdmer.
跨数据库微表情识别(CDMER)是近年来微表情分析中出现的热点问题之一。由于CDMER中的训练样本和测试样本来自不同的微表情数据库,导致训练集和测试集之间的特征分布不一致,因此与传统的微表情识别相比,CDMER更具挑战性。在本文中,我们从两个方面对这一主题做出贡献。首先,我们建立了一个CDMER实验评估方案,并提供了一个标准的平台来评估他们提出的方法。其次,采用9种最先进的领域自适应方法和6种流行的时空描述符,从两个不同的角度对CDMER问题进行了广泛的基准实验,并对实验结果进行了深入分析和讨论。此外,本文中涉及CDMER的所有数据和代码都在我们的项目网站:http://aip.seu.edu.cn/cdmer上发布。
{"title":"Cross-Database Micro-Expression Recognition: A Benchmark","authors":"Yuan Zong, Wenming Zheng, Xiaopeng Hong, Chuangao Tang, Zhen Cui, Guoying Zhao","doi":"10.1145/3323873.3326590","DOIUrl":"https://doi.org/10.1145/3323873.3326590","url":null,"abstract":"Cross-database micro-expression recognition (CDMER) is one of recently emerging and interesting problems in micro-expression analysis. CDMER is more challenging than the conventional micro-expression recognition (MER), because the training and testing samples in CDMER come from different micro-expression databases, resulting in inconsistency of the feature distributions between the training and testing sets. In this paper, we contribute to this topic from two aspects. First, we establish a CDMER experimental evaluation protocol and provide a standard platform for evaluating their proposed methods. Second, we conduct extensive benchmark experiments by using NINE state-of-the-art domain adaptation (DA) methods and SIX popular spatiotemporal descriptors for investigating the CDMER problem from two different perspectives and deeply analyze and discuss the experimental results. In addition, all the data and codes involving CDMER in this paper are released on our project website: http://aip.seu.edu.cn/cdmer.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134517612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
RACKNet
Pub Date : 2019-06-05 DOI: 10.1145/3323873.3325057
Yash Garg, K. Candan
Despite their impressive success when these hyper-parameters are suitably fine-tuned, the design of good network architectures remains an art-form rather than a science: while various search techniques, such as grid-search, have been proposed to find effective hyper-parameter configurations, often these parameters are hand-crafted (or the bounds of the search space are provided by a user). In this paper, we argue, and experimentally show, that we can minimize the need for hand-crafting, by relying on the dataset itself. In particular, we show that the dimensions, distributions, and complexities of localized features extracted from the data can inform the structure of the neural networks and help better allocate limited resources (such as kernels) to the various layers of the network. To achieve this, we first present several hypotheses that link the properties of the localized image features to the CNN and RCNN architectures and then, relying on these hypotheses, present a RACKNet framework which aims to learn multiple hyper-parameters by extracting information encoded in the input datasets. Experimental evaluations of RACKNet against major benchmark datasets, such as MNIST, SVHN, CIFAR10, COIL20 and ImageNet, show that RACKNet provides significant improvements in the network design and robustness to change in the network.
{"title":"RACKNet","authors":"Yash Garg, K. Candan","doi":"10.1145/3323873.3325057","DOIUrl":"https://doi.org/10.1145/3323873.3325057","url":null,"abstract":"Despite their impressive success when these hyper-parameters are suitably fine-tuned, the design of good network architectures remains an art-form rather than a science: while various search techniques, such as grid-search, have been proposed to find effective hyper-parameter configurations, often these parameters are hand-crafted (or the bounds of the search space are provided by a user). In this paper, we argue, and experimentally show, that we can minimize the need for hand-crafting, by relying on the dataset itself. In particular, we show that the dimensions, distributions, and complexities of localized features extracted from the data can inform the structure of the neural networks and help better allocate limited resources (such as kernels) to the various layers of the network. To achieve this, we first present several hypotheses that link the properties of the localized image features to the CNN and RCNN architectures and then, relying on these hypotheses, present a RACKNet framework which aims to learn multiple hyper-parameters by extracting information encoded in the input datasets. Experimental evaluations of RACKNet against major benchmark datasets, such as MNIST, SVHN, CIFAR10, COIL20 and ImageNet, show that RACKNet provides significant improvements in the network design and robustness to change in the network.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115613744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Increasingly Packing Multiple Facial-Informatics Modules in A Unified Deep-Learning Model via Lifelong Learning 通过终身学习,越来越多地将多个面部信息学模块打包到统一的深度学习模型中
Pub Date : 2019-06-05 DOI: 10.1145/3323873.3325053
Steven C. Y. Hung, Jia-Hong Lee, Timmy S. T. Wan, Chien-Hung Chen, Yi-Ming Chan, Chu-Song Chen
Simultaneously running multiple modules is a key requirement for a smart multimedia system for facial applications including face recognition, facial expression understanding, and gender identification. To effectively integrate them, a continual learning approach to learn new tasks without forgetting is introduced. Unlike previous methods growing monotonically in size, our approach maintains the compactness in continual learning. The proposed packing-and-expanding method is effective and easy to implement, which can iteratively shrink and enlarge the model to integrate new functions. Our integrated multitask model can achieve similar accuracy with only 39.9% of the original size.
同时运行多个模块是人脸识别、面部表情理解和性别识别等智能多媒体系统的关键要求。为了有效地整合它们,我们引入了一种持续学习的方法来学习新的任务而不会忘记。与以往的方法在规模上单调增长不同,我们的方法在持续学习中保持了紧凑性。该方法有效且易于实现,可以迭代地缩小和扩大模型以整合新的功能。我们的集成多任务模型可以达到相似的精度,而只有原始尺寸的39.9%。
{"title":"Increasingly Packing Multiple Facial-Informatics Modules in A Unified Deep-Learning Model via Lifelong Learning","authors":"Steven C. Y. Hung, Jia-Hong Lee, Timmy S. T. Wan, Chien-Hung Chen, Yi-Ming Chan, Chu-Song Chen","doi":"10.1145/3323873.3325053","DOIUrl":"https://doi.org/10.1145/3323873.3325053","url":null,"abstract":"Simultaneously running multiple modules is a key requirement for a smart multimedia system for facial applications including face recognition, facial expression understanding, and gender identification. To effectively integrate them, a continual learning approach to learn new tasks without forgetting is introduced. Unlike previous methods growing monotonically in size, our approach maintains the compactness in continual learning. The proposed packing-and-expanding method is effective and easy to implement, which can iteratively shrink and enlarge the model to integrate new functions. Our integrated multitask model can achieve similar accuracy with only 39.9% of the original size.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"59 24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115763193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Recognizing User-Defined Subsequences in Human Motion Data 识别用户自定义子序列在人体运动数据
Pub Date : 2019-06-05 DOI: 10.1145/3323873.3326922
J. Sedmidubský, P. Zezula
Motion capture technologies digitize human movements by tracking 3D positions of specific skeleton joints in time. Such spatio-temporal multimedia data have an enormous application potential in many fields, ranging from computer animation, through security and sports to medicine, but their computerized processing is a difficult problem. In this paper, we focus on an important task of recognition of a user-defined motion, based on a collection of labelled actions known in advance. We utilize current advances in deep feature learning and scalable similarity retrieval to build an effective and efficient k-nearest-neighbor recognition technique for 3D human motion data. The properties of the technique are demonstrated by a web application which allows a user to browse long motion sequences and specify any subsequence as the input for probabilistic recognition based on 130 predefined classes.
动作捕捉技术通过及时跟踪特定骨骼关节的三维位置,将人体运动数字化。这些时空多媒体数据在计算机动画、安全、体育、医学等诸多领域具有巨大的应用潜力,但其计算机化处理是一个难题。在本文中,我们关注一个重要的任务,即基于预先已知的标记动作集合来识别用户自定义的运动。我们利用当前在深度特征学习和可扩展相似性检索方面的最新进展,为三维人体运动数据建立了有效和高效的k-近邻识别技术。通过一个web应用程序演示了该技术的特性,该应用程序允许用户浏览长运动序列并指定任何子序列作为基于130个预定义类的概率识别的输入。
{"title":"Recognizing User-Defined Subsequences in Human Motion Data","authors":"J. Sedmidubský, P. Zezula","doi":"10.1145/3323873.3326922","DOIUrl":"https://doi.org/10.1145/3323873.3326922","url":null,"abstract":"Motion capture technologies digitize human movements by tracking 3D positions of specific skeleton joints in time. Such spatio-temporal multimedia data have an enormous application potential in many fields, ranging from computer animation, through security and sports to medicine, but their computerized processing is a difficult problem. In this paper, we focus on an important task of recognition of a user-defined motion, based on a collection of labelled actions known in advance. We utilize current advances in deep feature learning and scalable similarity retrieval to build an effective and efficient k-nearest-neighbor recognition technique for 3D human motion data. The properties of the technique are demonstrated by a web application which allows a user to browse long motion sequences and specify any subsequence as the input for probabilistic recognition based on 130 predefined classes.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115775415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Cloud Distributed Image Indexing by Sparse Hashing 基于稀疏散列的云分布式图像索引
Pub Date : 2019-06-05 DOI: 10.1145/3323873.3325046
André Mourão, João Magalhães
Distributing multimedia indexes to multiple nodes enables search over very large datasets (i.e., over one billion images and videos), but comes with a set of challenges: textithow to distribute documents and queries effectively across nodes to support concurrent querying? andhow to deal with the increased potential for lack of response from nodes (e.g., node fail-stops or dropping of network packages)? An index where partitions are based on the distribution of feature vectors in the original space can improve redundancy and increase efficiency: nearest neighbors are only present on a small, set number of partitions, reducing the number of nodes to inspect for each query. This paper describes how sparse hashes can help find this balance and create better distribution policies for high-dimensional feature vectors. Inspired by existing literature on distributed text and media indexes, our proposal distributes and balances documents and queries to a subset of the nodes, according to their orthogonal similarities. We performed exhaustive benchmarks of our approach on a commercial cloud service. Experiments on a one billion vector dataset show that our approach has a low partitioning overhead (3 to 5 ms per query), achieves balanced document and query distribution (the variation in document and query distribution across nodes is smaller than 1% and 10%, respectively), handles concurrent queries effectively and degrades gracefully with node failures (less than 2% of precision loss per node down).
将多媒体索引分布到多个节点可以对非常大的数据集(即超过10亿张图片和视频)进行搜索,但也带来了一系列挑战:如何跨节点有效地分布文档和查询以支持并发查询?如何处理节点缺乏响应的可能性增加(例如,节点故障停止或网络包丢失)?分区基于原始空间中特征向量的分布的索引可以改善冗余并提高效率:最近邻只存在于一小部分分区上,减少了每个查询需要检查的节点数量。本文描述了稀疏散列如何帮助找到这种平衡,并为高维特征向量创建更好的分布策略。受现有关于分布式文本和媒体索引的文献的启发,我们的提议根据它们的正交相似度将文档和查询分发和平衡到节点的子集。我们在一个商业云服务上对我们的方法进行了详尽的基准测试。在10亿个向量数据集上的实验表明,我们的方法具有较低的分区开销(每个查询3到5 ms),实现了平衡的文档和查询分布(节点之间文档和查询分布的变化分别小于1%和10%),有效地处理并发查询,并且在节点故障时优雅地退化(每个节点的精度损失小于2%)。
{"title":"Towards Cloud Distributed Image Indexing by Sparse Hashing","authors":"André Mourão, João Magalhães","doi":"10.1145/3323873.3325046","DOIUrl":"https://doi.org/10.1145/3323873.3325046","url":null,"abstract":"Distributing multimedia indexes to multiple nodes enables search over very large datasets (i.e., over one billion images and videos), but comes with a set of challenges: textithow to distribute documents and queries effectively across nodes to support concurrent querying? andhow to deal with the increased potential for lack of response from nodes (e.g., node fail-stops or dropping of network packages)? An index where partitions are based on the distribution of feature vectors in the original space can improve redundancy and increase efficiency: nearest neighbors are only present on a small, set number of partitions, reducing the number of nodes to inspect for each query. This paper describes how sparse hashes can help find this balance and create better distribution policies for high-dimensional feature vectors. Inspired by existing literature on distributed text and media indexes, our proposal distributes and balances documents and queries to a subset of the nodes, according to their orthogonal similarities. We performed exhaustive benchmarks of our approach on a commercial cloud service. Experiments on a one billion vector dataset show that our approach has a low partitioning overhead (3 to 5 ms per query), achieves balanced document and query distribution (the variation in document and query distribution across nodes is smaller than 1% and 10%, respectively), handles concurrent queries effectively and degrades gracefully with node failures (less than 2% of precision loss per node down).","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130840048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Annotating Objects and Relations in User-Generated Videos 标注用户生成视频中的对象和关系
Pub Date : 2019-06-05 DOI: 10.1145/3323873.3325056
Xindi Shang, Donglin Di, Junbin Xiao, Yu Cao, Xun Yang, Tat-Seng Chua
Understanding the objects and relations between them is indispensable to fine-grained video content analysis, which is widely studied in recent research works in multimedia and computer vision. However, existing works are limited to evaluating with either small datasets or indirect metrics, such as the performance over images. The underlying reason is that the construction of a large-scale video dataset with dense annotation is tricky and costly. In this paper, we address several main issues in annotating objects and relations in user-generated videos, and propose an annotation pipeline that can be executed at a modest cost. As a result, we present a new dataset, named VidOR, consisting of 10k videos (84 hours) together with dense annotations that localize 80 categories of objects and 50 categories of predicates in each video. We have made the training and validation set public and extendable for more tasks to facilitate future research on video object and relation recognition.
了解对象及其之间的关系对于细粒度视频内容分析是必不可少的,这是近年来多媒体和计算机视觉研究工作中广泛研究的问题。然而,现有的工作仅限于使用小数据集或间接指标进行评估,例如对图像的性能。其根本原因是构建具有密集注释的大规模视频数据集非常棘手且成本高昂。在本文中,我们解决了在用户生成视频中注释对象和关系的几个主要问题,并提出了一个可以以适度成本执行的注释管道。因此,我们提出了一个名为VidOR的新数据集,该数据集由10k个视频(84小时)和密集的注释组成,这些注释在每个视频中定位了80类对象和50类谓词。我们将训练和验证集公开并可扩展到更多的任务中,以促进未来视频对象和关系识别的研究。
{"title":"Annotating Objects and Relations in User-Generated Videos","authors":"Xindi Shang, Donglin Di, Junbin Xiao, Yu Cao, Xun Yang, Tat-Seng Chua","doi":"10.1145/3323873.3325056","DOIUrl":"https://doi.org/10.1145/3323873.3325056","url":null,"abstract":"Understanding the objects and relations between them is indispensable to fine-grained video content analysis, which is widely studied in recent research works in multimedia and computer vision. However, existing works are limited to evaluating with either small datasets or indirect metrics, such as the performance over images. The underlying reason is that the construction of a large-scale video dataset with dense annotation is tricky and costly. In this paper, we address several main issues in annotating objects and relations in user-generated videos, and propose an annotation pipeline that can be executed at a modest cost. As a result, we present a new dataset, named VidOR, consisting of 10k videos (84 hours) together with dense annotations that localize 80 categories of objects and 50 categories of predicates in each video. We have made the training and validation set public and extendable for more tasks to facilitate future research on video object and relation recognition.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"AES-19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126549335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 111
Progressive Image Enhancement under Aesthetic Guidance 美学指导下的渐进式图像增强
Pub Date : 2019-06-05 DOI: 10.1145/3323873.3325055
Xiaoyu Du, Xun Yang, Zhiguang Qin, Jinhui Tang
Most existing image enhancement methods function like a black box, which cannot clearly reveal the procedure behind each image enhancement operation. To overcome this limitation, in this paper, we design a progressive image enhancement framework, which generates an expected "good" retouched image with a group of self-interpretable image filters under the guidance of an aesthetic assessment model. The introduced aesthetic network effectively alleviates the shortage of paired training samples by providing extra supervision, and eliminate the bias caused by human subjective preferences. The self-interpretable image filters designed in our image enhancement framework, make the overall image enhancing procedure easy-to-understand. Extensive experiments demonstrate the effectiveness of our proposed framework.
现有的大多数图像增强方法都像一个黑匣子,无法清楚地揭示每个图像增强操作背后的过程。为了克服这一局限性,本文设计了一种渐进式图像增强框架,在审美评估模型的指导下,使用一组自解释图像滤波器生成预期的“良好”修饰图像。引入的审美网络通过提供额外的监督,有效地缓解了成对训练样本不足的问题,消除了人类主观偏好带来的偏差。在我们的图像增强框架中设计了自解释的图像滤波器,使整个图像增强过程易于理解。大量的实验证明了我们提出的框架的有效性。
{"title":"Progressive Image Enhancement under Aesthetic Guidance","authors":"Xiaoyu Du, Xun Yang, Zhiguang Qin, Jinhui Tang","doi":"10.1145/3323873.3325055","DOIUrl":"https://doi.org/10.1145/3323873.3325055","url":null,"abstract":"Most existing image enhancement methods function like a black box, which cannot clearly reveal the procedure behind each image enhancement operation. To overcome this limitation, in this paper, we design a progressive image enhancement framework, which generates an expected \"good\" retouched image with a group of self-interpretable image filters under the guidance of an aesthetic assessment model. The introduced aesthetic network effectively alleviates the shortage of paired training samples by providing extra supervision, and eliminate the bias caused by human subjective preferences. The self-interpretable image filters designed in our image enhancement framework, make the overall image enhancing procedure easy-to-understand. Extensive experiments demonstrate the effectiveness of our proposed framework.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126852041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Methods of Multi-Modal Data Exploration 多模态数据探索方法
Pub Date : 2019-06-05 DOI: 10.1145/3323873.3325858
Tomás Grosup
Techniques and tools designed for information retrieval, data exploration or data analytical tasks are based on the relational and text-search model, and cannot be easily applied to unstructured data such as images or videos. Researcher communities have been trying to reveal the semantics of multimedia in the last decades with ever-improving results in various tasks, dominated by the latest success of deep learning. Limits of object retrieval models drive the need for data exploration methods that support multi-modal data, like multimedia surrounded by structured attributes. In this paper, we describe, implement and evaluate exploration methods using multiple modalities and retrieval models in the context of multimedia. We apply the techniques in e-commerce product search and recommending, and demonstrate benefit for different retrieval scenarios. Lastly, we propose a method for extending database schema by latent visual attributes learned from image data. This enables closing the loop by going back to relational data, and potentially benefiting a range of industrial applications.
为信息检索、数据探索或数据分析任务而设计的技术和工具是基于关系和文本搜索模型的,不容易应用于图像或视频等非结构化数据。在过去的几十年里,研究人员一直试图揭示多媒体的语义,并在各种任务中取得了不断改进的成果,其中以深度学习的最新成功为主导。对象检索模型的局限性促使人们需要支持多模态数据的数据探索方法,比如被结构化属性包围的多媒体。在本文中,我们描述、实现和评估了在多媒体背景下使用多种模式和检索模型的探索方法。我们将该技术应用于电子商务产品搜索和推荐中,并在不同的检索场景下展示了其优势。最后,我们提出了一种利用从图像数据中学习到的潜在视觉属性来扩展数据库模式的方法。这可以通过返回关系数据来完成循环,并可能使一系列工业应用程序受益。
{"title":"Methods of Multi-Modal Data Exploration","authors":"Tomás Grosup","doi":"10.1145/3323873.3325858","DOIUrl":"https://doi.org/10.1145/3323873.3325858","url":null,"abstract":"Techniques and tools designed for information retrieval, data exploration or data analytical tasks are based on the relational and text-search model, and cannot be easily applied to unstructured data such as images or videos. Researcher communities have been trying to reveal the semantics of multimedia in the last decades with ever-improving results in various tasks, dominated by the latest success of deep learning. Limits of object retrieval models drive the need for data exploration methods that support multi-modal data, like multimedia surrounded by structured attributes. In this paper, we describe, implement and evaluate exploration methods using multiple modalities and retrieval models in the context of multimedia. We apply the techniques in e-commerce product search and recommending, and demonstrate benefit for different retrieval scenarios. Lastly, we propose a method for extending database schema by latent visual attributes learned from image data. This enables closing the loop by going back to relational data, and potentially benefiting a range of industrial applications.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134182586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Unsupervised Genetic Algorithm Framework for Rank Selection and Fusion on Image Retrieval 图像检索中秩选择与融合的无监督遗传算法框架
Pub Date : 2019-06-05 DOI: 10.1145/3323873.3325022
Lucas Pascotti Valem, D. C. G. Pedronette
Despite the major advances on feature development for low and mid-level representations, a single visual feature is often insufficient to achieve effective retrieval results in different scenarios. Since diverse visual properties provide distinct and often complementary information for a same query, the combination of different features, including handcrafted and learned features, has been establishing as a relevant trend in image retrieval. An intrinsic difficulty task consists in selecting and combining features that provide a high-effective result, which is often supported by supervised learning methods. However, in the absence of labeled data, selecting and fusing features in a completely unsupervised fashion becomes an essential, although very challenging task. The proposed genetic algorithm employs effectiveness estimation measures as fitness functions, making the evolutionary process fully unsupervised. Our approach was evaluated considering 3 public datasets and 35 different descriptors achieving relative gains up to +53.96% in scenarios with more than 8 billion possible combinations of rankers. The framework was also compared to different baselines, including state-of-the-art methods.
尽管低级和中级表示的特征开发取得了重大进展,但单一的视觉特征往往不足以在不同的场景中获得有效的检索结果。由于不同的视觉属性为相同的查询提供了不同的且通常是互补的信息,因此不同特征的组合,包括手工特征和学习特征,已经成为图像检索的一个相关趋势。一个内在困难的任务包括选择和组合提供高效结果的特征,这通常是由监督学习方法支持的。然而,在没有标记数据的情况下,以完全无监督的方式选择和融合特征变得必不可少,尽管这是一项非常具有挑战性的任务。提出的遗传算法采用有效性估计测度作为适应度函数,使进化过程完全无监督。我们的方法在3个公共数据集和35个不同的描述符的情况下进行了评估,在超过80亿个可能的排名组合的情况下,我们的方法获得了高达+53.96%的相对增益。该框架还与不同的基线进行了比较,包括最先进的方法。
{"title":"An Unsupervised Genetic Algorithm Framework for Rank Selection and Fusion on Image Retrieval","authors":"Lucas Pascotti Valem, D. C. G. Pedronette","doi":"10.1145/3323873.3325022","DOIUrl":"https://doi.org/10.1145/3323873.3325022","url":null,"abstract":"Despite the major advances on feature development for low and mid-level representations, a single visual feature is often insufficient to achieve effective retrieval results in different scenarios. Since diverse visual properties provide distinct and often complementary information for a same query, the combination of different features, including handcrafted and learned features, has been establishing as a relevant trend in image retrieval. An intrinsic difficulty task consists in selecting and combining features that provide a high-effective result, which is often supported by supervised learning methods. However, in the absence of labeled data, selecting and fusing features in a completely unsupervised fashion becomes an essential, although very challenging task. The proposed genetic algorithm employs effectiveness estimation measures as fitness functions, making the evolutionary process fully unsupervised. Our approach was evaluated considering 3 public datasets and 35 different descriptors achieving relative gains up to +53.96% in scenarios with more than 8 billion possible combinations of rankers. The framework was also compared to different baselines, including state-of-the-art methods.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121208376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
Proceedings of the 2019 on International Conference on Multimedia Retrieval
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1