Image enhancement is an important branch in the field of image processing. A few existing methods leverage Generative Adversarial Networks (GANs) for this task. However, they have several defects when applied to a specific type of images, such as food photo. First, a large set of original-enhanced image pairs are required to train GANs that have millions of parameters. Such image pairs are expensive to acquire. Second, color distribution of enhanced images generated by previous methods is not consistent with the original ones, which is not expected. To alleviate the issues above, we propose a novel method for food photo enhancement. No original-enhanced image pairs are required except only original images. We investigate Food Faithful Color Semantic Rules in Enhanced Dataset Photo Enhancement (Faith-EDPE) and also carefully design a light generator which can preserve semantic relations among colors. We evaluate the proposed method on public benchmark databases to demonstrate the effectiveness of the proposed method through visual results and user studies.
{"title":"Food Photo Enhancer of One Sample Generative Adversarial Network","authors":"Shudan Wang, Liang Sun, Weiming Dong, Yong Zhang","doi":"10.1145/3338533.3366605","DOIUrl":"https://doi.org/10.1145/3338533.3366605","url":null,"abstract":"Image enhancement is an important branch in the field of image processing. A few existing methods leverage Generative Adversarial Networks (GANs) for this task. However, they have several defects when applied to a specific type of images, such as food photo. First, a large set of original-enhanced image pairs are required to train GANs that have millions of parameters. Such image pairs are expensive to acquire. Second, color distribution of enhanced images generated by previous methods is not consistent with the original ones, which is not expected. To alleviate the issues above, we propose a novel method for food photo enhancement. No original-enhanced image pairs are required except only original images. We investigate Food Faithful Color Semantic Rules in Enhanced Dataset Photo Enhancement (Faith-EDPE) and also carefully design a light generator which can preserve semantic relations among colors. We evaluate the proposed method on public benchmark databases to demonstrate the effectiveness of the proposed method through visual results and user studies.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134574624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Online dating services have become popular in modern society. Pair matching prediction between two users in these services can help efficiently increase the possibility of finding their life partners. Deep learning based methods with automatic feature interaction functions such as Factorization Machines (FM) and cross network of Deep & Cross Network (DCN) can model sparse categorical features, which are effective to many recommendation tasks of web applications. To solve the partner recommendation task, we improve these FM-based deep models and DCN by enhancing the representation of feature interaction embedding and proposing a novel design of interaction layer avoiding information loss. Through the experiments on two real-world datasets of two online dating companies, we demonstrate the superior performances of our proposed designs.
在线约会服务在现代社会已经变得很流行。在这些服务中,对两个用户之间的配对预测可以有效地提高他们找到生活伴侣的可能性。基于深度学习的具有自动特征交互功能的方法,如Factorization Machines (FM)和cross network of Deep & cross network (DCN),可以对稀疏分类特征进行建模,对web应用的许多推荐任务都是有效的。为了解决伙伴推荐任务,我们通过增强特征交互嵌入的表示,提出了一种避免信息丢失的交互层设计,对这些基于fm的深度模型和DCN进行了改进。通过两家在线约会公司的两个真实数据集的实验,我们证明了我们提出的设计的优越性能。
{"title":"Deep Feature Interaction Embedding for Pair Matching Prediction","authors":"Luwei Zhang, Xueting Wang, T. Yamasaki","doi":"10.1145/3338533.3366597","DOIUrl":"https://doi.org/10.1145/3338533.3366597","url":null,"abstract":"Online dating services have become popular in modern society. Pair matching prediction between two users in these services can help efficiently increase the possibility of finding their life partners. Deep learning based methods with automatic feature interaction functions such as Factorization Machines (FM) and cross network of Deep & Cross Network (DCN) can model sparse categorical features, which are effective to many recommendation tasks of web applications. To solve the partner recommendation task, we improve these FM-based deep models and DCN by enhancing the representation of feature interaction embedding and proposing a novel design of interaction layer avoiding information loss. Through the experiments on two real-world datasets of two online dating companies, we demonstrate the superior performances of our proposed designs.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127678914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is common that TV audiences want to quickly browse scenes with certain actors in TV series. Since 2016, the TREC Video Retrieval Evaluation (TRECVID) Instance Search (INS) task has started to focus on identifying a target person in a target scene simultaneously. In this paper, we name this kind of task as P-S INS (Person-Scene Instance Search). To find out P-S instances, most approaches search person and scene separately, and then directly combine the results together by addition or multiplication. However, we find that person and scene INS modules are not always effective at the same time, or they may suppress each other in some situations. Aggregating the results shot after shot is not a good choice. Luckily, for the TV series, video shots are arranged in chronological order. We extend our focus from time point (single video shot) to time slice (multiple consecutive video shots) in the time-line. Through detecting salient time slices, we prune the data. Through evaluating the importance of salient time slices, we boost the aggregation results. Extensive experiments on the large-scale TRECVID INS dataset demonstrate the effectiveness of the proposed method.
电视观众想要快速浏览电视剧中某些演员的场景是很常见的。自2016年以来,TREC视频检索评估(TRECVID)实例搜索(INS)任务开始专注于同时识别目标场景中的目标人物。本文将这类任务命名为P-S - INS (Person-Scene Instance Search)。为了找出P-S实例,大多数方法是分别搜索人和场景,然后直接通过加法或乘法将结果组合在一起。然而,我们发现人物和场景INS模块并不总是同时有效,或者在某些情况下它们可能会相互抑制。一个接一个地汇总结果并不是一个好的选择。幸运的是,对于电视剧来说,视频镜头是按时间顺序排列的。我们将焦点从时间点(单个视频镜头)扩展到时间线上的时间片(多个连续视频镜头)。通过检测显著性时间片,对数据进行修剪。通过对显著时间片的重要性进行评价,增强了聚合结果。在大规模TRECVID INS数据集上的大量实验证明了该方法的有效性。
{"title":"Salient Time Slice Pruning and Boosting for Person-Scene Instance Search in TV Series","authors":"Z. Wang, Fan Yang, S. Satoh","doi":"10.1145/3338533.3366594","DOIUrl":"https://doi.org/10.1145/3338533.3366594","url":null,"abstract":"It is common that TV audiences want to quickly browse scenes with certain actors in TV series. Since 2016, the TREC Video Retrieval Evaluation (TRECVID) Instance Search (INS) task has started to focus on identifying a target person in a target scene simultaneously. In this paper, we name this kind of task as P-S INS (Person-Scene Instance Search). To find out P-S instances, most approaches search person and scene separately, and then directly combine the results together by addition or multiplication. However, we find that person and scene INS modules are not always effective at the same time, or they may suppress each other in some situations. Aggregating the results shot after shot is not a good choice. Luckily, for the TV series, video shots are arranged in chronological order. We extend our focus from time point (single video shot) to time slice (multiple consecutive video shots) in the time-line. Through detecting salient time slices, we prune the data. Through evaluating the importance of salient time slices, we boost the aggregation results. Extensive experiments on the large-scale TRECVID INS dataset demonstrate the effectiveness of the proposed method.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"166 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116111026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lijuan Duan, Huiling Geng, Jun Zeng, Junbiao Pang, Qingming Huang
Crack width is an important indicator to diagnose the safety of constructions, e.g., asphalt road, concrete bridge. In practice, measuring crack width is a challenge task: (1) the irregular and non-smooth boundary makes the traditional method inefficient; (2) pixel-wise measurement guarantees the accuracy of a system and (3) understanding the damage of constructions from any pre-selected points is a mandatary requirement. To address these problems, we propose a cascade Principal Component Analysis (PCA) to efficiently measure crack width from images. Firstly, the binary crack image is obtained to describe the crack via the off-the-shelf crack detection algorithms. Secondly, given a pre-selected point, PCA is used to find the main axis of a crack. Thirdly, Robust Principal Component Analysis (RPCA) is proposed to compute the main axis of a crack with a irregular boundary. We evaluate the proposed method on a real data set. The experimental results show that the proposed method achieves the state-of-the-art performances in terms of efficiency and effectiveness.
{"title":"Fast and Accurately Measuring Crack Width via Cascade Principal Component Analysis","authors":"Lijuan Duan, Huiling Geng, Jun Zeng, Junbiao Pang, Qingming Huang","doi":"10.1145/3338533.3366578","DOIUrl":"https://doi.org/10.1145/3338533.3366578","url":null,"abstract":"Crack width is an important indicator to diagnose the safety of constructions, e.g., asphalt road, concrete bridge. In practice, measuring crack width is a challenge task: (1) the irregular and non-smooth boundary makes the traditional method inefficient; (2) pixel-wise measurement guarantees the accuracy of a system and (3) understanding the damage of constructions from any pre-selected points is a mandatary requirement. To address these problems, we propose a cascade Principal Component Analysis (PCA) to efficiently measure crack width from images. Firstly, the binary crack image is obtained to describe the crack via the off-the-shelf crack detection algorithms. Secondly, given a pre-selected point, PCA is used to find the main axis of a crack. Thirdly, Robust Principal Component Analysis (RPCA) is proposed to compute the main axis of a crack with a irregular boundary. We evaluate the proposed method on a real data set. The experimental results show that the proposed method achieves the state-of-the-art performances in terms of efficiency and effectiveness.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116321583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Best Paper Session","authors":"Wen-Huang Cheng","doi":"10.1145/3379189","DOIUrl":"https://doi.org/10.1145/3379189","url":null,"abstract":"","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127687230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep convolutional neural networks (CNNs) extract local features and learn spatial representations via convolutions in the spatial domain. Beyond the spatial information, some works also manage to capture the spectral information in the frequency domain by domain switching methods like discrete Fourier transform (DFT) and discrete cosine transform (DCT). However, most works only pay attention to a single domain, which is prone to ignoring other important features. In this work, we propose a novel network structure to combine spatial and spectral convolutions, and extract features in both spatial and frequency domains. The input channels are divided into two groups for spatial and spectral representations respectively, and then integrated for feature fusion. Meanwhile, we design a channel-shifting mechanism to ensure both spatial and spectral information of every channel are equally and adequately obtained throughout the deep networks. Experimental results demonstrate that compared with state-of-the-art CNN models in a single domain, our shifted spatial-spectral convolution based networks achieve better performance on image classification datasets including CIFAR10, CIFAR100 and SVHN, with considerably fewer parameters.
{"title":"Shifted Spatial-Spectral Convolution for Deep Neural Networks","authors":"Yuhao Xu, Hideki Nakayama","doi":"10.1145/3338533.3366575","DOIUrl":"https://doi.org/10.1145/3338533.3366575","url":null,"abstract":"Deep convolutional neural networks (CNNs) extract local features and learn spatial representations via convolutions in the spatial domain. Beyond the spatial information, some works also manage to capture the spectral information in the frequency domain by domain switching methods like discrete Fourier transform (DFT) and discrete cosine transform (DCT). However, most works only pay attention to a single domain, which is prone to ignoring other important features. In this work, we propose a novel network structure to combine spatial and spectral convolutions, and extract features in both spatial and frequency domains. The input channels are divided into two groups for spatial and spectral representations respectively, and then integrated for feature fusion. Meanwhile, we design a channel-shifting mechanism to ensure both spatial and spectral information of every channel are equally and adequately obtained throughout the deep networks. Experimental results demonstrate that compared with state-of-the-art CNN models in a single domain, our shifted spatial-spectral convolution based networks achieve better performance on image classification datasets including CIFAR10, CIFAR100 and SVHN, with considerably fewer parameters.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131335939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiwei Wei, Yang Yang, Jingjing Li, Lei Zhu, Lin Zuo, Heng Tao Shen
Most existing Zero-Shot Learning (ZSL) approaches adopt the semantic space as a bridge to classify unseen categories. However, it is difficult to transfer knowledge from seen categories to unseen categories through semantic space, since the correlations among categories are uncertain and ambiguous in the semantic space. In this paper, we formulated zero-shot learning as a classifier weight regression problem. Specifically, we propose a novel Residual Graph Convolution Network (ResGCN) which takes word embeddings and knowledge graph as inputs and outputs a visual classifier for each category. ResGCN can effectively alleviate the problem of over-smoothing and over-fitting. During the test, an unseen image can be classified by ranking the inner product of its visual feature and predictive visual classifiers. Moreover, we provide a new method to build a better knowledge graph. Our approach not only further enhances the correlations among categories, but also makes it easy to add new categories to the knowledge graph. Experiments conducted on the large-scale ImageNet 2011 21K dataset demonstrate that our method significantly outperforms existing state-of-the-art approaches.
{"title":"Residual Graph Convolutional Networks for Zero-Shot Learning","authors":"Jiwei Wei, Yang Yang, Jingjing Li, Lei Zhu, Lin Zuo, Heng Tao Shen","doi":"10.1145/3338533.3366552","DOIUrl":"https://doi.org/10.1145/3338533.3366552","url":null,"abstract":"Most existing Zero-Shot Learning (ZSL) approaches adopt the semantic space as a bridge to classify unseen categories. However, it is difficult to transfer knowledge from seen categories to unseen categories through semantic space, since the correlations among categories are uncertain and ambiguous in the semantic space. In this paper, we formulated zero-shot learning as a classifier weight regression problem. Specifically, we propose a novel Residual Graph Convolution Network (ResGCN) which takes word embeddings and knowledge graph as inputs and outputs a visual classifier for each category. ResGCN can effectively alleviate the problem of over-smoothing and over-fitting. During the test, an unseen image can be classified by ranking the inner product of its visual feature and predictive visual classifiers. Moreover, we provide a new method to build a better knowledge graph. Our approach not only further enhances the correlations among categories, but also makes it easy to add new categories to the knowledge graph. Experiments conducted on the large-scale ImageNet 2011 21K dataset demonstrate that our method significantly outperforms existing state-of-the-art approaches.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130653052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Human Analysis in Multimedia","authors":"Bingkun Bao","doi":"10.1145/3379193","DOIUrl":"https://doi.org/10.1145/3379193","url":null,"abstract":"","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134473366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Boyu Zhang, Xiangguo Ding, Xiaowen Huang, Yang Cao, J. Sang, Jian Yu
With the rapid development of Online Social Networks (OSNs), it is crucial to construct users' portraits from their dynamic behaviors to address the increasing needs for customized information services. Previous work on user attribute inference mainly concentrated on developing advanced features/models or exploiting external information and knowledge but ignored the contradiction between dynamic behaviors and stable demographic attributes, which results in deviation of user understanding To address the contradiction and accurately infer the user attributes, we propose a Multi-source User Attribute Inference algorithm based on Hierarchical Auto-encoder (MUAI-HAE). The basic idea is that: the shared patterns among the same individual's behaviors on different OSNs well indicate his/her stable demographic attributes. The hierarchical autoencoder is introduced to realize this idea by discovering the underlying non-linear correlation between different OSNs. The unsupervised scheme in shared pattern learning alleviates the requirements for the cross-OSN user account and improves the practicability. Off-the-shelf classification methods are then utilized to infer user attributes from the derived shared behavior patterns. The experiments on the real-world datasets from three OSNs demonstrate the effectiveness of the proposed method.
随着在线社交网络(Online Social Networks,简称OSNs)的快速发展,从用户的动态行为中构建用户画像,以满足日益增长的个性化信息服务需求至关重要。以往的用户属性推断工作主要集中在开发高级特征/模型或利用外部信息和知识,而忽略了动态行为与稳定的人口统计属性之间的矛盾,导致用户理解偏差。为了解决这一矛盾,准确推断用户属性,我们提出了一种基于分层自编码器(MUAI-HAE)的多源用户属性推断算法。其基本思想是:同一个体在不同osn上的共同行为模式很好地反映了其稳定的人口统计属性。分层自编码器通过发现不同osn之间潜在的非线性相关性来实现这一思想。共享模式学习中的无监督方案减轻了对跨osn用户账号的要求,提高了实用性。然后利用现成的分类方法从派生的共享行为模式推断用户属性。在三个osn的实际数据集上的实验证明了该方法的有效性。
{"title":"Multi-source User Attribute Inference based on Hierarchical Auto-encoder","authors":"Boyu Zhang, Xiangguo Ding, Xiaowen Huang, Yang Cao, J. Sang, Jian Yu","doi":"10.1145/3338533.3366599","DOIUrl":"https://doi.org/10.1145/3338533.3366599","url":null,"abstract":"With the rapid development of Online Social Networks (OSNs), it is crucial to construct users' portraits from their dynamic behaviors to address the increasing needs for customized information services. Previous work on user attribute inference mainly concentrated on developing advanced features/models or exploiting external information and knowledge but ignored the contradiction between dynamic behaviors and stable demographic attributes, which results in deviation of user understanding To address the contradiction and accurately infer the user attributes, we propose a Multi-source User Attribute Inference algorithm based on Hierarchical Auto-encoder (MUAI-HAE). The basic idea is that: the shared patterns among the same individual's behaviors on different OSNs well indicate his/her stable demographic attributes. The hierarchical autoencoder is introduced to realize this idea by discovering the underlying non-linear correlation between different OSNs. The unsupervised scheme in shared pattern learning alleviates the requirements for the cross-OSN user account and improves the practicability. Off-the-shelf classification methods are then utilized to infer user attributes from the derived shared behavior patterns. The experiments on the real-world datasets from three OSNs demonstrate the effectiveness of the proposed method.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133059206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the attribute recognition area, attributes that are unrelated in the real world may have a high co-occurrence rate in a dataset due to the dataset bias, which forms a misleading relatedness. A neural network, especially a multi-task neural network, trained on this dataset would learn this relatedness, and be misled when it is used in practice. In this paper, we propose Share-and-Compete Multi-Task deep learning (SCMTL) model to handle this problem. This model uses adversarial training methods to enhance competition between unrelated attributes while keeping sharing between related attributes, making the task-specific layer of the multi-task model to be more specific and thus rule out the misleading relatedness between the unrelated attributes. Experiments performed on elaborately designed datasets show that the proposed model outperforms the single task neural network and the traditional multi-task neural network in the situation mentioned above.
{"title":"Excluding the Misleading Relatedness Between Attributes in Multi-Task Attribute Recognition Network","authors":"Sirui Cai, Yuchun Fang","doi":"10.1145/3338533.3366555","DOIUrl":"https://doi.org/10.1145/3338533.3366555","url":null,"abstract":"In the attribute recognition area, attributes that are unrelated in the real world may have a high co-occurrence rate in a dataset due to the dataset bias, which forms a misleading relatedness. A neural network, especially a multi-task neural network, trained on this dataset would learn this relatedness, and be misled when it is used in practice. In this paper, we propose Share-and-Compete Multi-Task deep learning (SCMTL) model to handle this problem. This model uses adversarial training methods to enhance competition between unrelated attributes while keeping sharing between related attributes, making the task-specific layer of the multi-task model to be more specific and thus rule out the misleading relatedness between the unrelated attributes. Experiments performed on elaborately designed datasets show that the proposed model outperforms the single task neural network and the traditional multi-task neural network in the situation mentioned above.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131321149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}