首页 > 最新文献

Proceedings of the 26th ACM international conference on Multimedia最新文献

英文 中文
Direction-aware Neural Style Transfer 方向感知神经风格迁移
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240629
Hao Wu, Zhengxing Sun, Weihang Yuan
Neural learning methods have been shown to be effective in style transfer. These methods, which are called NST, aim to synthesize a new image that retains the high-level structure of a content image while keeps the low-level features of a style image. However, these models using convolutional structures only extract local statistical features of style images and semantic features of content images. Since the absence of low-level features in the content image, these methods would synthesize images that look unnatural and full of traces of machines. In this paper, we find that direction, that is, the orientation of each painting stroke, can capture the soul of image style preferably and thus generates much more natural and vivid stylizations. According to this observation, we propose a Direction-aware Neural Style Transfer (DaNST) with two major innovations. First, a novel direction field loss is proposed to steer the direction of strokes in the synthesized image. And to build this loss function, we propose novel direction field loss networks to generate and compare the direction fields of content image and synthesized image. By incorporating the direction field loss in neural style transfer, we obtain a new optimization objective. Through minimizing this objective, we can produce synthesized images that better follow the direction field of the content image. Second, our method provides a simple interaction mechanism to control the generated direction fields, and further control the texture direction in synthesized images. Experiments show that our method outperforms state-of-the-art in most styles such as oil painting and mosaic.
神经学习方法已被证明是有效的风格迁移。这些方法被称为NST,目的是合成一个新图像,它保留了内容图像的高级结构,同时保留了样式图像的低级特征。然而,这些使用卷积结构的模型只能提取风格图像的局部统计特征和内容图像的语义特征。由于内容图像中缺乏低级特征,这些方法将合成看起来不自然且充满机器痕迹的图像。在本文中,我们发现方向,即每一个笔触的方向,能够更好地捕捉到图像风格的灵魂,从而产生更加自然和生动的风格化。根据这一观察,我们提出了一个方向感知神经风格迁移(DaNST),主要有两个创新。首先,提出了一种新的方向场损失方法来控制合成图像中笔画的方向。为了构建这一损失函数,我们提出了一种新的方向场损失网络来生成和比较内容图像和合成图像的方向场。通过引入神经风格迁移中的方向场损失,得到了一个新的优化目标。通过最小化这个目标,我们可以生成更好地遵循内容图像方向场的合成图像。其次,我们的方法提供了一种简单的交互机制来控制生成的方向场,并进一步控制合成图像中的纹理方向。实验表明,我们的方法在油画和马赛克等大多数风格中都优于最先进的技术。
{"title":"Direction-aware Neural Style Transfer","authors":"Hao Wu, Zhengxing Sun, Weihang Yuan","doi":"10.1145/3240508.3240629","DOIUrl":"https://doi.org/10.1145/3240508.3240629","url":null,"abstract":"Neural learning methods have been shown to be effective in style transfer. These methods, which are called NST, aim to synthesize a new image that retains the high-level structure of a content image while keeps the low-level features of a style image. However, these models using convolutional structures only extract local statistical features of style images and semantic features of content images. Since the absence of low-level features in the content image, these methods would synthesize images that look unnatural and full of traces of machines. In this paper, we find that direction, that is, the orientation of each painting stroke, can capture the soul of image style preferably and thus generates much more natural and vivid stylizations. According to this observation, we propose a Direction-aware Neural Style Transfer (DaNST) with two major innovations. First, a novel direction field loss is proposed to steer the direction of strokes in the synthesized image. And to build this loss function, we propose novel direction field loss networks to generate and compare the direction fields of content image and synthesized image. By incorporating the direction field loss in neural style transfer, we obtain a new optimization objective. Through minimizing this objective, we can produce synthesized images that better follow the direction field of the content image. Second, our method provides a simple interaction mechanism to control the generated direction fields, and further control the texture direction in synthesized images. Experiments show that our method outperforms state-of-the-art in most styles such as oil painting and mosaic.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116476001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Session details: Panel-2 会议详情:小组2
Pub Date : 2018-10-15 DOI: 10.1145/3286937
Jiaying Liu, Wen-Huang Cheng
{"title":"Session details: Panel-2","authors":"Jiaying Liu, Wen-Huang Cheng","doi":"10.1145/3286937","DOIUrl":"https://doi.org/10.1145/3286937","url":null,"abstract":"","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114706699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FlexStream
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240676
Ibrahim Ben Mustafa, T. Nadeem, Emir Halepovic
We present FlexStream, a programmable framework realized by implementing Software-Defined Networking (SDN) functionality on end devices. FlexStream exploits the benefits of both centralized and distributed components to achieve dynamic management of end devices, as required and in accordance with specified policies. We evaluate FlexStream on one example use case -- the adaptive video streaming, where bandwidth control is employed to drive selection of video bitrates, improve stability and increase robustness against background traffic. When applied to competing streaming clients, FlexStream reduces bitrate switching by 81%, stall duration by 92%, and startup delay by 44%, while improving fairness among players. In addition, we report the first implementation of SDN-based control in Android devices running in real Wi-Fi and live cellular networks.
{"title":"FlexStream","authors":"Ibrahim Ben Mustafa, T. Nadeem, Emir Halepovic","doi":"10.1145/3240508.3240676","DOIUrl":"https://doi.org/10.1145/3240508.3240676","url":null,"abstract":"We present FlexStream, a programmable framework realized by implementing Software-Defined Networking (SDN) functionality on end devices. FlexStream exploits the benefits of both centralized and distributed components to achieve dynamic management of end devices, as required and in accordance with specified policies. We evaluate FlexStream on one example use case -- the adaptive video streaming, where bandwidth control is employed to drive selection of video bitrates, improve stability and increase robustness against background traffic. When applied to competing streaming clients, FlexStream reduces bitrate switching by 81%, stall duration by 92%, and startup delay by 44%, while improving fairness among players. In addition, we report the first implementation of SDN-based control in Android devices running in real Wi-Fi and live cellular networks.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115432625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Unprecedented Usage of Pre-trained CNNs on Beauty Product 在美容产品上史无前例地使用预先训练的cnn
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3266433
Jian Han Lim, Nurul Japar, Chun Chet Ng, Chee Seng Chan
How does a pre-trained Convolution Neural Network (CNN) model perform on beauty and personal care items (i.e Perfect-500K) This is the question we attempt to answer in this paper by adopting several well known deep learning models pre-trained on ImageNet, and evaluate their performance using different distance metrics. In the Perfect Corp Challenge, we manage to secure fourth position by using only the pre-trained model.
预训练的卷积神经网络(CNN)模型在美容和个人护理项目(即Perfect-500K)上的表现如何?这是我们在本文中试图回答的问题,我们采用了几个在ImageNet上预训练的知名深度学习模型,并使用不同的距离指标评估它们的性能。在完美企业挑战赛中,我们仅通过使用预训练模型就获得了第四名。
{"title":"Unprecedented Usage of Pre-trained CNNs on Beauty Product","authors":"Jian Han Lim, Nurul Japar, Chun Chet Ng, Chee Seng Chan","doi":"10.1145/3240508.3266433","DOIUrl":"https://doi.org/10.1145/3240508.3266433","url":null,"abstract":"How does a pre-trained Convolution Neural Network (CNN) model perform on beauty and personal care items (i.e Perfect-500K) This is the question we attempt to answer in this paper by adopting several well known deep learning models pre-trained on ImageNet, and evaluate their performance using different distance metrics. In the Perfect Corp Challenge, we manage to secure fourth position by using only the pre-trained model.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130579867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
An Efficient Deep Quantized Compressed Sensing Coding Framework of Natural Images 一种高效的自然图像深度量化压缩感知编码框架
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240706
Wenxue Cui, F. Jiang, Xinwei Gao, Shengping Zhang, Debin Zhao
Traditional image compressed sensing (CS) coding frameworks solve an inverse problem that is based on the measurement coding tools (prediction, quantization, entropy coding, etc.) and the optimization based image reconstruction method. These CS coding frameworks face the challenges of improving the coding efficiency at the encoder, while also suffering from high computational complexity at the decoder. In this paper, we move forward a step and propose a novel deep network based CS coding framework of natural images, which consists of three sub-networks: sampling sub-network, offset sub-network and reconstruction sub-network that responsible for sampling, quantization and reconstruction, respectively. By cooperatively utilizing these sub-networks, it can be trained in the form of an end-to-end metric with a proposed rate-distortion optimization loss function. The proposed framework not only improves the coding performance, but also reduces the computational cost of the image reconstruction dramatically. Experimental results on benchmark datasets demonstrate that the proposed method is capable of achieving superior rate-distortion performance against state-of-the-art methods.
传统的图像压缩感知(CS)编码框架解决的是基于测量编码工具(预测、量化、熵编码等)和基于优化的图像重构方法的逆问题。这些CS编码框架面临着提高编码器编码效率的挑战,同时也面临着解码器计算复杂度高的问题。本文进一步提出了一种新的基于深度网络的自然图像CS编码框架,该框架由三个子网络组成:采样子网络、偏移子网络和重构子网络,分别负责采样、量化和重构。通过协同利用这些子网络,它可以以端到端度量的形式与提出的率失真优化损失函数进行训练。该框架不仅提高了编码性能,而且大大降低了图像重构的计算成本。在基准数据集上的实验结果表明,与现有的方法相比,该方法能够获得更好的率失真性能。
{"title":"An Efficient Deep Quantized Compressed Sensing Coding Framework of Natural Images","authors":"Wenxue Cui, F. Jiang, Xinwei Gao, Shengping Zhang, Debin Zhao","doi":"10.1145/3240508.3240706","DOIUrl":"https://doi.org/10.1145/3240508.3240706","url":null,"abstract":"Traditional image compressed sensing (CS) coding frameworks solve an inverse problem that is based on the measurement coding tools (prediction, quantization, entropy coding, etc.) and the optimization based image reconstruction method. These CS coding frameworks face the challenges of improving the coding efficiency at the encoder, while also suffering from high computational complexity at the decoder. In this paper, we move forward a step and propose a novel deep network based CS coding framework of natural images, which consists of three sub-networks: sampling sub-network, offset sub-network and reconstruction sub-network that responsible for sampling, quantization and reconstruction, respectively. By cooperatively utilizing these sub-networks, it can be trained in the form of an end-to-end metric with a proposed rate-distortion optimization loss function. The proposed framework not only improves the coding performance, but also reduces the computational cost of the image reconstruction dramatically. Experimental results on benchmark datasets demonstrate that the proposed method is capable of achieving superior rate-distortion performance against state-of-the-art methods.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"2012 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129686551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Inferring User Emotive State Changes in Realistic Human-Computer Conversational Dialogs 推断现实人机对话中用户情绪状态的变化
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240575
Runnan Li, Zhiyong Wu, Jia Jia, Jingbei Li, Wei Chen, H. Meng
Human-computer conversational interactions are increasingly pervasive in real-world applications, such as chatbots and virtual assistants. The user experience can be enhanced through affective design of such conversational dialogs, especially in enabling the computer to understand the emotive state in the user's input, and to generate an appropriate system response within the dialog turn. Such a system response may further influence the user's emotive state in the subsequent dialog turn. In this paper, we focus on the change in the user's emotive states in adjacent dialog turns, to which we refer as user emotive state change. We propose a multi-modal, multi-task deep learning framework to infer the user's emotive states and emotive state changes simultaneously. Multi-task learning convolution fusion auto-encoder is applied to fuse the acoustic and textual features to generate a robust representation of the user's input. Long-short term memory recurrent auto-encoder is employed to extract features of system responses at the sentence-level to better capture factors affecting user emotive states. Multi-task learned structured output layer is adopted to model the dependency of user emotive state change, conditioned upon the user input's emotive states and system response in current dialog turn. Experimental results demonstrate the effectiveness of the proposed method.
人机对话交互在现实世界的应用中越来越普遍,比如聊天机器人和虚拟助手。通过对这种会话式对话的情感性设计可以增强用户体验,特别是使计算机能够理解用户输入中的情感状态,并在对话回合内产生适当的系统响应。这样的系统响应可能进一步影响用户在随后的对话回合中的情绪状态。在本文中,我们关注的是相邻对话回合中用户情绪状态的变化,我们称之为用户情绪状态的变化。我们提出了一个多模态、多任务的深度学习框架来同时推断用户的情绪状态和情绪状态的变化。采用多任务学习卷积融合自编码器融合声音和文本特征,生成用户输入的鲁棒表示。采用长短期记忆循环自编码器在句子层面提取系统反应特征,更好地捕捉影响用户情绪状态的因素。采用多任务学习的结构化输出层,以当前对话回合中用户输入的情绪状态和系统的响应为条件,对用户情绪状态变化的依赖性进行建模。实验结果证明了该方法的有效性。
{"title":"Inferring User Emotive State Changes in Realistic Human-Computer Conversational Dialogs","authors":"Runnan Li, Zhiyong Wu, Jia Jia, Jingbei Li, Wei Chen, H. Meng","doi":"10.1145/3240508.3240575","DOIUrl":"https://doi.org/10.1145/3240508.3240575","url":null,"abstract":"Human-computer conversational interactions are increasingly pervasive in real-world applications, such as chatbots and virtual assistants. The user experience can be enhanced through affective design of such conversational dialogs, especially in enabling the computer to understand the emotive state in the user's input, and to generate an appropriate system response within the dialog turn. Such a system response may further influence the user's emotive state in the subsequent dialog turn. In this paper, we focus on the change in the user's emotive states in adjacent dialog turns, to which we refer as user emotive state change. We propose a multi-modal, multi-task deep learning framework to infer the user's emotive states and emotive state changes simultaneously. Multi-task learning convolution fusion auto-encoder is applied to fuse the acoustic and textual features to generate a robust representation of the user's input. Long-short term memory recurrent auto-encoder is employed to extract features of system responses at the sentence-level to better capture factors affecting user emotive states. Multi-task learned structured output layer is adopted to model the dependency of user emotive state change, conditioned upon the user input's emotive states and system response in current dialog turn. Experimental results demonstrate the effectiveness of the proposed method.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128107137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Historical Context-based Style Classification of Painting Images via Label Distribution Learning 基于历史语境的标签分布学习绘画图像风格分类
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240593
Jufeng Yang, Liyi Chen, Le Zhang, Xiaoxiao Sun, Dongyu She, Shao-Ping Lu, Ming-Ming Cheng
Analyzing and categorizing the style of visual art images, especially paintings, is gaining popularity owing to its importance in understanding and appreciating the art. The evolution of painting style is both continuous, in a sense that new styles may inherit, develop or even mutate from their predecessors and multi-modal because of various issues such as the visual appearance, the birthplace, the origin time and the art movement. Motivated by this peculiarity, we introduce a novel knowledge distilling strategy to assist visual feature learning in the convolutional neural network for painting style classification. More specifically, a multi-factor distribution is employed as soft-labels to distill complementary information with visual input, which extracts from different historical context via label distribution learning. The proposed method is well-encapsulated in a multi-task learning framework which allows end-to-end training. We demonstrate the superiority of the proposed method over the state-of-the-art approaches on Painting91, OilPainting, and Pandora datasets.
分析和分类视觉艺术图像,特别是绘画的风格,由于其对理解和欣赏艺术的重要性而越来越受欢迎。绘画风格的演变是连续的,从某种意义上说,新的风格可能在其前身的基础上继承、发展甚至突变,并且由于视觉外观、出生地、起源时间和艺术运动等各种问题而具有多模态性。基于这一特性,我们引入了一种新的知识提取策略来辅助卷积神经网络的视觉特征学习,用于绘画风格分类。更具体地说,采用多因素分布作为软标签,通过标签分布学习从不同的历史背景中提取视觉输入的互补信息。该方法被很好地封装在一个多任务学习框架中,允许端到端训练。我们在Painting91、OilPainting和Pandora数据集上证明了所提出的方法优于最先进的方法。
{"title":"Historical Context-based Style Classification of Painting Images via Label Distribution Learning","authors":"Jufeng Yang, Liyi Chen, Le Zhang, Xiaoxiao Sun, Dongyu She, Shao-Ping Lu, Ming-Ming Cheng","doi":"10.1145/3240508.3240593","DOIUrl":"https://doi.org/10.1145/3240508.3240593","url":null,"abstract":"Analyzing and categorizing the style of visual art images, especially paintings, is gaining popularity owing to its importance in understanding and appreciating the art. The evolution of painting style is both continuous, in a sense that new styles may inherit, develop or even mutate from their predecessors and multi-modal because of various issues such as the visual appearance, the birthplace, the origin time and the art movement. Motivated by this peculiarity, we introduce a novel knowledge distilling strategy to assist visual feature learning in the convolutional neural network for painting style classification. More specifically, a multi-factor distribution is employed as soft-labels to distill complementary information with visual input, which extracts from different historical context via label distribution learning. The proposed method is well-encapsulated in a multi-task learning framework which allows end-to-end training. We demonstrate the superiority of the proposed method over the state-of-the-art approaches on Painting91, OilPainting, and Pandora datasets.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125681871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Deep Learning Interpretation 深度学习解释
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3241472
J. Sang
Deep learning has been successfully exploited in addressing different multimedia problems in recent years. The academic researchers are now transferring their attention from identifying what problem deep learning CAN address to exploring what problem deep learning CAN NOT address. This tutorial starts with a summarization of six 'CAN NOT' problems deep learning fails to solve at the current stage, i.e., low stability, debugging difficulty, poor parameter transparency, poor incrementality, poor reasoning ability, and machine bias. These problems share a common origin from the lack of deep learning interpretation. This tutorial attempts to correspond the six 'NOT' problems to three levels of deep learning interpretation: (1) Locating - accurately and efficiently locating which feature contributes much to the output. (2) Understanding - bidirectional semantic accessing between human knowledge and deep learning algorithm. (3) Expandability - well storing, accumulating and reusing the models learned from deep learning. Existing studies falling into these three levels will be reviewed in detail, and a discussion on the future interesting directions will be provided in the end.
近年来,深度学习已成功地应用于解决各种多媒体问题。学术研究人员现在将他们的注意力从确定深度学习可以解决什么问题转移到探索深度学习不能解决什么问题。本教程首先总结了深度学习在当前阶段无法解决的六个“CAN NOT”问题,即稳定性低、调试困难、参数透明度差、递增性差、推理能力差和机器偏差。这些问题有一个共同的根源,那就是缺乏深度学习解释。本教程试图将六个“不”问题对应于深度学习解释的三个层次:(1)定位-准确有效地定位哪个特征对输出贡献很大。(2)理解——人类知识与深度学习算法之间的双向语义访问。(3)可扩展性——很好地存储、积累和重用从深度学习中学习到的模型。本文将对这三个层次的现有研究进行详细回顾,并对未来的研究方向进行讨论。
{"title":"Deep Learning Interpretation","authors":"J. Sang","doi":"10.1145/3240508.3241472","DOIUrl":"https://doi.org/10.1145/3240508.3241472","url":null,"abstract":"Deep learning has been successfully exploited in addressing different multimedia problems in recent years. The academic researchers are now transferring their attention from identifying what problem deep learning CAN address to exploring what problem deep learning CAN NOT address. This tutorial starts with a summarization of six 'CAN NOT' problems deep learning fails to solve at the current stage, i.e., low stability, debugging difficulty, poor parameter transparency, poor incrementality, poor reasoning ability, and machine bias. These problems share a common origin from the lack of deep learning interpretation. This tutorial attempts to correspond the six 'NOT' problems to three levels of deep learning interpretation: (1) Locating - accurately and efficiently locating which feature contributes much to the output. (2) Understanding - bidirectional semantic accessing between human knowledge and deep learning algorithm. (3) Expandability - well storing, accumulating and reusing the models learned from deep learning. Existing studies falling into these three levels will be reviewed in detail, and a discussion on the future interesting directions will be provided in the end.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123166567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Cumulative Nets for Edge Detection 边缘检测的累积网
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240688
Jingkuan Song, Zhilong Zhou, Lianli Gao, Xing Xu, Heng Tao Shen
Lots of recent progress have been made by using Convolutional Neural Networks (CNN) for edge detection. Due to the nature of hierarchical representations learned in CNN, it is intuitive to design side networks utilizing the richer convolutional features to improve the edge detection. However, different side networks are isolated, and the final results are usually weighted sum of the side outputs with uneven qualities. To tackle these issues, we propose a Cumulative Network (C-Net), which learns the side network cumulatively based on current visual features and low-level side outputs, to gradually remove detailed or sharp boundaries to enable high-resolution and accurate edge detection. Therefore, the lower-level edge information is cumulatively inherited while the superfluous details are progressively abandoned. In fact, recursively Learningwhere to remove superfluous details from the current edge map with the supervision of a higher-level visual feature is challenging. Furthermore, we employ atrous convolution (AC) and atrous convolution pyramid pooling (ASPP) to robustly detect object boundaries at multiple scales and aspect ratios. Also, cumulatively refining edges using high-level visual information and lower-lever edge maps is achieved by our designed cumulative residual attention (CRA) block. Experimental results show that our C-Net sets new records for edge detection on both two benchmark datasets: BSDS500 (i.e., .819 ODS, .835 OIS and .862 AP) and NYUDV2 (i.e., .762 ODS, .781 OIS, .797 AP). C-Net has great potential to be applied to other deep learning based applications, e.g., image classification and segmentation.
近年来,利用卷积神经网络(CNN)进行边缘检测取得了许多进展。由于在CNN中学习的分层表示的性质,利用更丰富的卷积特征来设计侧网络来改进边缘检测是直观的。然而,不同的侧网络是孤立的,最终结果通常是质量不均匀的侧输出的加权和。为了解决这些问题,我们提出了一个累积网络(C-Net),它基于当前的视觉特征和低水平的侧输出来累积学习侧网络,以逐渐去除详细或尖锐的边界,从而实现高分辨率和准确的边缘检测。因此,较低层次的边缘信息被累积继承,而多余的细节被逐步抛弃。事实上,在更高层次视觉特征的监督下,递归地学习从当前边缘地图中删除多余细节的位置是具有挑战性的。此外,我们还采用了亚光卷积(AC)和亚光卷积金字塔池(ASPP)来鲁棒地检测多尺度和高宽比下的物体边界。此外,通过我们设计的累积剩余注意(CRA)块,可以使用高级视觉信息和低级边缘图来累积精炼边缘。实验结果表明,我们的C-Net在两个基准数据集:BSDS500(即。819 ODS,。835 OIS和。862 AP)和NYUDV2(即。762 ODS,。781 OIS,。797 AP)上都创造了新的边缘检测记录。C-Net在其他基于深度学习的应用中具有很大的应用潜力,例如图像分类和分割。
{"title":"Cumulative Nets for Edge Detection","authors":"Jingkuan Song, Zhilong Zhou, Lianli Gao, Xing Xu, Heng Tao Shen","doi":"10.1145/3240508.3240688","DOIUrl":"https://doi.org/10.1145/3240508.3240688","url":null,"abstract":"Lots of recent progress have been made by using Convolutional Neural Networks (CNN) for edge detection. Due to the nature of hierarchical representations learned in CNN, it is intuitive to design side networks utilizing the richer convolutional features to improve the edge detection. However, different side networks are isolated, and the final results are usually weighted sum of the side outputs with uneven qualities. To tackle these issues, we propose a Cumulative Network (C-Net), which learns the side network cumulatively based on current visual features and low-level side outputs, to gradually remove detailed or sharp boundaries to enable high-resolution and accurate edge detection. Therefore, the lower-level edge information is cumulatively inherited while the superfluous details are progressively abandoned. In fact, recursively Learningwhere to remove superfluous details from the current edge map with the supervision of a higher-level visual feature is challenging. Furthermore, we employ atrous convolution (AC) and atrous convolution pyramid pooling (ASPP) to robustly detect object boundaries at multiple scales and aspect ratios. Also, cumulatively refining edges using high-level visual information and lower-lever edge maps is achieved by our designed cumulative residual attention (CRA) block. Experimental results show that our C-Net sets new records for edge detection on both two benchmark datasets: BSDS500 (i.e., .819 ODS, .835 OIS and .862 AP) and NYUDV2 (i.e., .762 ODS, .781 OIS, .797 AP). C-Net has great potential to be applied to other deep learning based applications, e.g., image classification and segmentation.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126306489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Session details: FF-2 会话详情:FF-2
Pub Date : 2018-10-15 DOI: 10.1145/3286917
Peng Cui
{"title":"Session details: FF-2","authors":"Peng Cui","doi":"10.1145/3286917","DOIUrl":"https://doi.org/10.1145/3286917","url":null,"abstract":"","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114359036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 26th ACM international conference on Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1