Proceedings of the 26th ACM international conference on Multimedia最新文献

英文中文

Jaguar 捷豹

Proceedings of the 26th ACM international conference on Multimedia

Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240561

Wenxiao Zhang, Bo Han, P. Hui

In this paper, we present the design, implementation and evaluation of Jaguar, a mobile Augmented Reality (AR) system that features accurate, low-latency, and large-scale object recognition and flexible, robust, and context-aware tracking. Jaguar pushes the limit of mobile AR's end-to-end latency by leveraging hardware acceleration with GPUs on edge cloud. Another distinctive aspect of Jaguar is that it seamlessly integrates marker-less object tracking offered by the recently released AR development tools (e.g., ARCore and ARKit) into its design. Indeed, some approaches used in Jaguar have been studied before in a standalone manner, e.g., it is known that cloud offloading can significantly decrease the computational latency of AR. However, the question of whether the combination of marker-less tracking, cloud offloading and GPU acceleration would satisfy the desired end-to-end latency of mobile AR (i.e., the interval of camera frames) has not been eloquently addressed yet. We demonstrate via a prototype implementation of our proposed holistic solution that Jaguar reduces the end-to-end latency to ~33 ms. It also achieves accurate six degrees of freedom tracking and 97% recognition accuracy for a dataset with 10,000 images.

{"title":"Jaguar","authors":"Wenxiao Zhang, Bo Han, P. Hui","doi":"10.1145/3240508.3240561","DOIUrl":"https://doi.org/10.1145/3240508.3240561","url":null,"abstract":"In this paper, we present the design, implementation and evaluation of Jaguar, a mobile Augmented Reality (AR) system that features accurate, low-latency, and large-scale object recognition and flexible, robust, and context-aware tracking. Jaguar pushes the limit of mobile AR's end-to-end latency by leveraging hardware acceleration with GPUs on edge cloud. Another distinctive aspect of Jaguar is that it seamlessly integrates marker-less object tracking offered by the recently released AR development tools (e.g., ARCore and ARKit) into its design. Indeed, some approaches used in Jaguar have been studied before in a standalone manner, e.g., it is known that cloud offloading can significantly decrease the computational latency of AR. However, the question of whether the combination of marker-less tracking, cloud offloading and GPU acceleration would satisfy the desired end-to-end latency of mobile AR (i.e., the interval of camera frames) has not been eloquently addressed yet. We demonstrate via a prototype implementation of our proposed holistic solution that Jaguar reduces the end-to-end latency to ~33 ms. It also achieves accurate six degrees of freedom tracking and 97% recognition accuracy for a dataset with 10,000 images.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134107758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 62

Improving QoE of ABR Streaming Sessions through QUIC Retransmissions 通过QUIC重传提高ABR流会话的QoE

Proceedings of the 26th ACM international conference on Multimedia

Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240664

Divyashri Bhat, R. Deshmukh, M. Zink

While adaptive bitrate (ABR) streaming has contributed significantly to the reduction of video playout stalling, ABR clients continue to suffer from the variation of bit rate qualities over the duration of a streaming session. Similar to stalling, these variations in bit rate quality have a negative impact on the users' Quality of Experience (QoE). In this paper, we use a trace from a large-scale CDN to show that such quality changes occur in a significant amount of streaming sessions and investigate an ABR video segment retransmission approach to reduce the number of such quality changes. As the new HTTP/2 standard is becoming increasingly popular, we also see an increase in the usage of QUIC as an alternative protocol for the transmission of web traffic including video streaming. Using various network conditions, we conduct a systematic comparison of existing transport layer approaches for HTTP/2 that is best suited for ABR segment retransmissions. Since it is well known that both protocols provide a series of improvements over HTTP/1.1, we perform experiments both in controlled environments and over transcontinental links in the Internet and find that these benefits also "trickle up'' into the application layer when it comes to ABR video streaming where QUIC retransmissions can significantly improve the average quality bitrate while simultaneously minimizing bit rate variations over the duration of a streaming session.

虽然自适应比特率(ABR)流媒体对减少视频播放延迟做出了重大贡献，但在流媒体会话期间，ABR客户端仍然受到比特率质量变化的影响。与失速类似，比特率质量的这些变化对用户的体验质量(QoE)有负面影响。在本文中，我们使用大规模CDN的跟踪来显示这种质量变化发生在大量的流会话中，并研究了ABR视频片段重传方法来减少这种质量变化的数量。随着新的HTTP/2标准变得越来越流行，我们也看到QUIC作为包括视频流在内的网络流量传输的替代协议的使用在增加。使用各种网络条件，我们对最适合ABR段重传的现有HTTP/2传输层方法进行了系统的比较。众所周知，这两个协议都在HTTP/1.1上提供了一系列改进，我们在受控环境和互联网上的跨大陆链路上进行了实验，发现当涉及ABR视频流时，这些好处也“涓滴”进入应用层，其中QUIC重传可以显着提高平均质量比特率，同时最小化比特率在流会话持续时间内的变化。

{"title":"Improving QoE of ABR Streaming Sessions through QUIC Retransmissions","authors":"Divyashri Bhat, R. Deshmukh, M. Zink","doi":"10.1145/3240508.3240664","DOIUrl":"https://doi.org/10.1145/3240508.3240664","url":null,"abstract":"While adaptive bitrate (ABR) streaming has contributed significantly to the reduction of video playout stalling, ABR clients continue to suffer from the variation of bit rate qualities over the duration of a streaming session. Similar to stalling, these variations in bit rate quality have a negative impact on the users' Quality of Experience (QoE). In this paper, we use a trace from a large-scale CDN to show that such quality changes occur in a significant amount of streaming sessions and investigate an ABR video segment retransmission approach to reduce the number of such quality changes. As the new HTTP/2 standard is becoming increasingly popular, we also see an increase in the usage of QUIC as an alternative protocol for the transmission of web traffic including video streaming. Using various network conditions, we conduct a systematic comparison of existing transport layer approaches for HTTP/2 that is best suited for ABR segment retransmissions. Since it is well known that both protocols provide a series of improvements over HTTP/1.1, we perform experiments both in controlled environments and over transcontinental links in the Internet and find that these benefits also \"trickle up'' into the application layer when it comes to ABR video streaming where QUIC retransmissions can significantly improve the average quality bitrate while simultaneously minimizing bit rate variations over the duration of a streaming session.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134447507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Dissimilarity Representation Learning for Generalized Zero-Shot Recognition 广义零射击识别的不相似表示学习

Proceedings of the 26th ACM international conference on Multimedia

Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240686

Gang Yang, Jinlu Liu, Jieping Xu, Xirong Li

Generalized zero-shot learning (GZSL) aims to recognize any test instance coming either from a known class or from a novel class that has no training instance. To synthesize training instances for novel classes and thus resolving GZSL as a common classification problem, we propose a Dissimilarity Representation Learning (DSS) method. Dissimilarity representation is to represent a specific instance in terms of its (dis)similarity to other instances in a visual or attribute based feature space. In the dissimilarity space, instances of the novel classes are synthesized by an end-to-end optimized neural network. The neural network realizes two-level feature mappings and domain adaptions in the dissimilarity space and the attribute based feature space. Experimental results on five benchmark datasets, i.e., AWA, AWA$_2$, SUN, CUB, and aPY, show that the proposed method improves the state-of-the-art with a large margin, approximately 10% gain in terms of the harmonic mean of the top-1 accuracy. Consequently, this paper establishes a new baseline for GZSL.

广义零次学习(GZSL)旨在识别来自已知类或没有训练实例的新类的任何测试实例。为了综合新类别的训练实例，从而解决GZSL作为一个常见的分类问题，我们提出了一种不相似表示学习(DSS)方法。不相似表示是指在视觉或基于属性的特征空间中，根据其与其他实例的(非)相似度来表示特定实例。在不相似空间中，通过端到端优化的神经网络合成新类的实例。神经网络在不相似空间和基于属性的特征空间中实现了两级特征映射和领域自适应。在AWA、AWA$_2$、SUN、CUB和aPY等5个基准数据集上的实验结果表明，本文提出的方法在top-1精度的谐波平均值方面有较大的提高，提高幅度约为10%。因此，本文建立了GZSL的新基线。

引用次数: 20

Dynamic Sound Field Synthesis for Speech and Music Optimization 动态声场合成语音和音乐优化

Proceedings of the 26th ACM international conference on Multimedia

Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240644

Zhenyu Tang, Nicolás Morales, Dinesh Manocha

We present a novel acoustic optimization algorithm to synthesize dynamic sound fields in a static scene. Our approach places new active loudspeakers or virtual sources in the scene so that the dynamic sound field in a region satisfies optimization criteria to improve speech and music perception. We use a frequency domain formulation of sound propagation and reduce the computation of dynamic sound field synthesis to solving a linear least squares problem, and do not impose any constraints on the environment or loudspeakers type, or loudspeaker placement. We highlight the performance on complex indoor scenes in terms of speech and music improvements. We evaluate the performance with a user study and highlight the perceptual benefits for virtual reality and multimedia applications.

提出了一种在静态场景中合成动态声场的声学优化算法。我们的方法在场景中放置新的有源扬声器或虚拟声源，以便区域内的动态声场满足优化标准，以提高语音和音乐感知。我们使用声音传播的频域公式，并将动态声场合成的计算减少到解决线性最小二乘问题，并且不对环境或扬声器类型或扬声器放置施加任何限制。我们强调在复杂的室内场景的表现，在语音和音乐方面的改进。我们通过用户研究来评估性能，并强调虚拟现实和多媒体应用的感知优势。

引用次数: 4

Online Inter-Camera Trajectory Association Exploiting Person Re-Identification and Camera Topology 基于人再识别和摄像机拓扑的在线摄像机间轨迹关联

Proceedings of the 26th ACM international conference on Multimedia

Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240663

Na Jiang, Si-Yuan Bai, Yue Xu, Chang Xing, Zhong Zhou, Wei Wu

Online inter-camera trajectory association is a promising topic in intelligent video surveillance, which concentrates on associating trajectories belong to the same individual across different cameras according to time. It remains challenging due to the inconsistent appearance of a person in different cameras and the lack of spatio-temporal constraints between cameras. Besides, the orientation variations and the partial occlusions significantly increase the difficulty of inter-camera trajectory association. Targeting to solve these problems, this work proposes an orientation-driven person re-identification (ODPR) and an effective camera topology estimation based on appearance features for online inter-camera trajectory association. ODPR explicitly leverages the orientation cues and stable torso features to learn discriminative feature representations for identifying trajectories across cameras, which alleviates the pedestrian orientation variations by the designed orientation-driven loss function and orientation aware weights. The effective camera topology estimation introduces appearance features to generate the correct spatio-temporal constraints for narrowing the retrieval range, which improves the time efficiency and provides the possibility for intelligent inter-camera trajectory association in large-scale surveillance environments. Extensive experimental results demonstrate that our proposed approach significantly outperforms most state-of-the-art methods on the popular person re-identification datasets and the public multi-target, multi-camera tracking benchmark.

在线摄像机间轨迹关联是智能视频监控中一个很有前途的研究课题，它关注的是根据时间将不同摄像机间属于同一个体的轨迹关联起来。它仍然具有挑战性，因为一个人在不同的相机中出现的外观不一致，并且缺乏相机之间的时空限制。此外，方向变化和部分遮挡显著增加了相机间轨迹关联的难度。针对这些问题，本文提出了一种方向驱动的人再识别(ODPR)和一种有效的基于外观特征的相机拓扑估计方法，用于在线相机间轨迹关联。ODPR明确地利用方向线索和稳定的躯干特征来学习判别特征表示，以识别摄像机之间的轨迹，通过设计的方向驱动损失函数和方向感知权来减轻行人的方向变化。有效的摄像机拓扑估计引入了外观特征，生成了正确的时空约束，从而缩小了检索范围，提高了时间效率，为大规模监控环境下的智能摄像机间轨迹关联提供了可能。大量的实验结果表明，我们提出的方法在流行的人物再识别数据集和公共多目标，多相机跟踪基准上明显优于最先进的方法。

{"title":"Online Inter-Camera Trajectory Association Exploiting Person Re-Identification and Camera Topology","authors":"Na Jiang, Si-Yuan Bai, Yue Xu, Chang Xing, Zhong Zhou, Wei Wu","doi":"10.1145/3240508.3240663","DOIUrl":"https://doi.org/10.1145/3240508.3240663","url":null,"abstract":"Online inter-camera trajectory association is a promising topic in intelligent video surveillance, which concentrates on associating trajectories belong to the same individual across different cameras according to time. It remains challenging due to the inconsistent appearance of a person in different cameras and the lack of spatio-temporal constraints between cameras. Besides, the orientation variations and the partial occlusions significantly increase the difficulty of inter-camera trajectory association. Targeting to solve these problems, this work proposes an orientation-driven person re-identification (ODPR) and an effective camera topology estimation based on appearance features for online inter-camera trajectory association. ODPR explicitly leverages the orientation cues and stable torso features to learn discriminative feature representations for identifying trajectories across cameras, which alleviates the pedestrian orientation variations by the designed orientation-driven loss function and orientation aware weights. The effective camera topology estimation introduces appearance features to generate the correct spatio-temporal constraints for narrowing the retrieval range, which improves the time efficiency and provides the possibility for intelligent inter-camera trajectory association in large-scale surveillance environments. Extensive experimental results demonstrate that our proposed approach significantly outperforms most state-of-the-art methods on the popular person re-identification datasets and the public multi-target, multi-camera tracking benchmark.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131259728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

Drawing in a Virtual 3D Space - Introducing VR Drawing in Elementary School Art Education 虚拟三维空间中的绘画——介绍VR绘画在小学美术教育中的应用

Proceedings of the 26th ACM international conference on Multimedia

Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240692

W. Bolier, Wolfgang Hürst, G. V. Bommel, Joost Bosman, H. Bosman

Drawing is an important part of elementary school education, especially since it contributes to the development of spatial skills. Virtual reality enables us to draw not just on a flat 2D surface, but in 3D space. Our research aims at showing if and how this form of 3D drawing can be beneficial for art education. This paper presents first insights into potential benefits and obstacles when introducing 3D drawing at elementary schools. In an experiment with 18 children, we studied practical aspects, proficiency, and spatial ability development. Our results show improvement in the children's 3D drawing skills but not in their spatial abilities. Their drawing skills also do seem to be correlated with their mental rotation ability, although further research is needed to conclusively confirm this.

绘画是小学教育的重要组成部分，特别是因为它有助于发展空间技能。虚拟现实使我们不仅可以在二维平面上作画，还可以在三维空间中作画。我们的研究旨在展示这种形式的3D绘画是否以及如何有益于艺术教育。本文首先介绍了在小学引入3D绘画时的潜在好处和障碍。在一项有18名儿童参与的实验中，我们研究了实践方面、熟练程度和空间能力的发展。我们的研究结果表明，孩子们的3D绘画技能有所提高，但他们的空间能力却没有提高。他们的绘画技巧似乎也与他们的心理旋转能力有关，尽管需要进一步的研究来最终证实这一点。

引用次数: 11

Fashion Sensitive Clothing Recommendation Using Hierarchical Collocation Model 基于分层搭配模型的时尚敏感性服装推荐

Proceedings of the 26th ACM international conference on Multimedia

Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240596

Zhengzhong Zhou, Xiu Di, Wei Zhou, Liqing Zhang

Automatic clothing recommendation grows dramatically due to the booming of apparel e-commerce. In this paper, we propose a novel clothing recommendation approach which is sensitive to the fashion trend. The proposed approach incorporates the expert knowledge into multiple dimensional information including purchase behaviors, image contents and product descriptions so as to provide recommendation of clothing in line with the forefront of fashion. Meanwhile, to meet with human visual aesthetics and user's collocation experience, we propose the integration of the convolutional neural network and the hierarchical collocation model (HCM) into our framework. The former is to extract effective visual features and attribute descriptors from the clothing items, while the latter embeds them into the concept of style topics which interpret the collocation pattern from a higher level of semantic knowledge. Such a data driven recommendation approach is able to learn clothing collocation metric from multi-dimensional clothing information. Experimental results show that our HCM method achieves better performance than other state-of-the-art baselines. Besides, it also ensures the fashion sensitivity of the recommended outfits.

随着服装电子商务的蓬勃发展，服装自动推荐的数量急剧增加。在本文中，我们提出了一种新颖的服装推荐方法，该方法对时尚趋势非常敏感。该方法将专家知识融入购买行为、图片内容、产品描述等多维信息中，提供符合时尚前沿的服装推荐。同时，为了满足人类的视觉审美和用户的搭配体验，我们提出将卷积神经网络和层次搭配模型(HCM)融合到我们的框架中。前者是从服装中提取有效的视觉特征和属性描述符，后者将其嵌入到风格主题的概念中，从更高层次的语义知识来解释搭配模式。这种数据驱动的推荐方法能够从多维服装信息中学习服装搭配度量。实验结果表明，我们的HCM方法比其他最先进的基线具有更好的性能。此外，这也保证了推荐服装的时尚敏感性。

{"title":"Fashion Sensitive Clothing Recommendation Using Hierarchical Collocation Model","authors":"Zhengzhong Zhou, Xiu Di, Wei Zhou, Liqing Zhang","doi":"10.1145/3240508.3240596","DOIUrl":"https://doi.org/10.1145/3240508.3240596","url":null,"abstract":"Automatic clothing recommendation grows dramatically due to the booming of apparel e-commerce. In this paper, we propose a novel clothing recommendation approach which is sensitive to the fashion trend. The proposed approach incorporates the expert knowledge into multiple dimensional information including purchase behaviors, image contents and product descriptions so as to provide recommendation of clothing in line with the forefront of fashion. Meanwhile, to meet with human visual aesthetics and user's collocation experience, we propose the integration of the convolutional neural network and the hierarchical collocation model (HCM) into our framework. The former is to extract effective visual features and attribute descriptors from the clothing items, while the latter embeds them into the concept of style topics which interpret the collocation pattern from a higher level of semantic knowledge. Such a data driven recommendation approach is able to learn clothing collocation metric from multi-dimensional clothing information. Experimental results show that our HCM method achieves better performance than other state-of-the-art baselines. Besides, it also ensures the fashion sensitivity of the recommended outfits.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115797291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Multi-Human Parsing Machines 多人解析机

Proceedings of the 26th ACM international conference on Multimedia

Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240515

Jianshu Li, Jian Zhao, Yunpeng Chen, S. Roy, Shuicheng Yan, Jiashi Feng, T. Sim

Human parsing is an important task in human-centric analysis. Despite the remarkable progress in single-human parsing, the more realistic case of multi-human parsing remains challenging in terms of the data and the model. Compared with the considerable number of available single-human parsing datasets, the datasets for multi-human parsing are very limited in number mainly due to the huge annotation effort required. Besides the data challenge to multi-human parsing, the persons in real-world scenarios are often entangled with each other due to close interaction and body occlusion, making it difficult to distinguish body parts from different person instances. In this paper we propose the Multi-Human Parsing Machines (MHPM) system, which contains an MHP Montage model and an MHP Solver, to address both challenges in multi-human parsing. Specifically, the MHP Montage model in MHPM generates realistic images with multiple persons together with the parsing labels. It intelligently composes single persons onto background scene images while maintaining the structural information between persons and the scene. The generated images can be used to train better multi-human parsing algorithms. On the other hand, the MHP Solver in MHPM solves the bottleneck of distinguishing multiple entangled persons with close interaction. It employs a Group-Individual Push and Pull (GIPP) loss function, which can effectively separate persons with close interaction. We experimentally show that the proposed MHPM can achieve state-of-the-art performance on the multi-human parsing benchmark and the person individualization benchmark, which distinguishes closely entangled person instances.

人工解析是以人为中心的分析中的一项重要任务。尽管在单人解析方面取得了显著的进展，但就数据和模型而言，更现实的多人解析情况仍然具有挑战性。与大量可用的单人解析数据集相比，用于多人解析的数据集数量非常有限，这主要是由于需要大量的注释工作。除了对多人解析的数据挑战外，现实场景中的人由于密切的互动和身体遮挡，经常相互纠缠，使得从不同的人实例中区分身体部位变得困难。在本文中，我们提出了包含MHP蒙太奇模型和MHP求解器的多人解析机(MHPM)系统来解决多人解析中的这两个挑战。具体来说，MHPM中的MHP蒙太奇模型生成具有多个人物和解析标签的逼真图像。它在保持人物与场景之间的结构信息的同时，将单个人物智能地合成到背景场景图像中。生成的图像可以用来训练更好的多人解析算法。另一方面，MHPM中的MHP解算器解决了多个相互密切的纠缠人员难以识别的瓶颈问题。它采用了群体-个人推拉(GIPP)损失函数，可以有效地分离亲密互动的人。实验表明，所提出的MHPM可以在多人解析基准和个人个性化基准上达到最先进的性能，该基准可以区分紧密纠缠的人实例。

{"title":"Multi-Human Parsing Machines","authors":"Jianshu Li, Jian Zhao, Yunpeng Chen, S. Roy, Shuicheng Yan, Jiashi Feng, T. Sim","doi":"10.1145/3240508.3240515","DOIUrl":"https://doi.org/10.1145/3240508.3240515","url":null,"abstract":"Human parsing is an important task in human-centric analysis. Despite the remarkable progress in single-human parsing, the more realistic case of multi-human parsing remains challenging in terms of the data and the model. Compared with the considerable number of available single-human parsing datasets, the datasets for multi-human parsing are very limited in number mainly due to the huge annotation effort required. Besides the data challenge to multi-human parsing, the persons in real-world scenarios are often entangled with each other due to close interaction and body occlusion, making it difficult to distinguish body parts from different person instances. In this paper we propose the Multi-Human Parsing Machines (MHPM) system, which contains an MHP Montage model and an MHP Solver, to address both challenges in multi-human parsing. Specifically, the MHP Montage model in MHPM generates realistic images with multiple persons together with the parsing labels. It intelligently composes single persons onto background scene images while maintaining the structural information between persons and the scene. The generated images can be used to train better multi-human parsing algorithms. On the other hand, the MHP Solver in MHPM solves the bottleneck of distinguishing multiple entangled persons with close interaction. It employs a Group-Individual Push and Pull (GIPP) loss function, which can effectively separate persons with close interaction. We experimentally show that the proposed MHPM can achieve state-of-the-art performance on the multi-human parsing benchmark and the person individualization benchmark, which distinguishes closely entangled person instances.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114622437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Structure Guided Photorealistic Style Transfer 结构引导的逼真风格转移

Proceedings of the 26th ACM international conference on Multimedia

Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240637

Yuheng Zhi, Huawei Wei, Bingbing Ni

Recent style transfer methods based on deep networks strive to generate more content matching stylized images by adding semantic guidance in the iterative process. However, these approaches can just guarantee the transfer of integral color and texture distribution between semantically equivalent regions, but local variation within these regions cannot be accurately captured. Therefore, the resulting image lacks local plausibility. To this end, we develop a non-parametric patch based style transfer framework to synthesize more content coherent images. By designing a novel patch matching algorithm which simultaneously takes high-level category information and geometric structure information (e.g., human pose and building structure) into account, our proposed method is capable of transferring more detailed distribution and producing more photorealistic stylized images. We show that our approach achieves remarkable style transfer results on contents with geometric structure, including human body, vehicles, buildings, etc.

最近的基于深度网络的风格迁移方法通过在迭代过程中添加语义引导，力求生成更多与风格化图像匹配的内容。然而，这些方法只能保证整体颜色和纹理分布在语义等效区域之间的传递，而不能准确捕获这些区域内的局部变化。因此，得到的图像缺乏局部合理性。为此，我们开发了一个基于非参数补丁的风格转移框架来合成更多内容连贯的图像。通过设计一种同时考虑高级类别信息和几何结构信息(如人体姿态和建筑结构)的新颖的补丁匹配算法，我们提出的方法能够传递更详细的分布，产生更逼真的风格化图像。结果表明，该方法在人体、车辆、建筑物等具有几何结构的内容上取得了显著的风格迁移效果。

引用次数: 6

WildFish

Proceedings of the 26th ACM international conference on Multimedia

Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240616

Peiqin Zhuang, Yali Wang, Yu Qiao

Fish recognition is an important task to understand the marine ecosystem and biodiversity. It is often challenging to identify fish species in the wild, due to the following difficulties. First, most fish benchmarks are small-scale, which may limit the representation power of machine learning models. Second, the number of fish species is huge, and there may still exist unknown categories in our planet. The traditional classifiers often fail to deal with this open-set scenario. Third, certain fish species are highly-confused. It is often hard to figure out the subtle differences, only by the unconstrained images. Motivated by these facts, we introduce a large-scale WildFish benchmark for fish recognition in the wild. Specifically, we make three contributions in this paper. First, WildFish is the largest image data set for wild fish recognition, to our best knowledge. It consists of 1000 fish categories with 54,459 unconstrained images, allowing to train high-capacity models for automatic fish classification. Second, we propose a novel open-set fish classification task for realistic scenarios, and investigate the open-set deep learning framework with a number of practical designs. Third, we propose a novel fine-grained recognition task, with the guidance of pairwise textual descriptions. Via leveraging the comparison knowledge in the sentence, we design a multi-modal fish net to effectively distinguish two confused categories in a pair. Finally, we release WildFish (https://github.com/PeiqinZhuang/WildFish), in order to bring benefit to more research studies in multimedia and beyond.

{"title":"WildFish","authors":"Peiqin Zhuang, Yali Wang, Yu Qiao","doi":"10.1145/3240508.3240616","DOIUrl":"https://doi.org/10.1145/3240508.3240616","url":null,"abstract":"Fish recognition is an important task to understand the marine ecosystem and biodiversity. It is often challenging to identify fish species in the wild, due to the following difficulties. First, most fish benchmarks are small-scale, which may limit the representation power of machine learning models. Second, the number of fish species is huge, and there may still exist unknown categories in our planet. The traditional classifiers often fail to deal with this open-set scenario. Third, certain fish species are highly-confused. It is often hard to figure out the subtle differences, only by the unconstrained images. Motivated by these facts, we introduce a large-scale WildFish benchmark for fish recognition in the wild. Specifically, we make three contributions in this paper. First, WildFish is the largest image data set for wild fish recognition, to our best knowledge. It consists of 1000 fish categories with 54,459 unconstrained images, allowing to train high-capacity models for automatic fish classification. Second, we propose a novel open-set fish classification task for realistic scenarios, and investigate the open-set deep learning framework with a number of practical designs. Third, we propose a novel fine-grained recognition task, with the guidance of pairwise textual descriptions. Via leveraging the comparison knowledge in the sentence, we design a multi-modal fish net to effectively distinguish two confused categories in a pair. Finally, we release WildFish (https://github.com/PeiqinZhuang/WildFish), in order to bring benefit to more research studies in multimedia and beyond.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"186 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123462334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 26th ACM international conference on Multimedia

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀