Improving Outfit Recommendation with Co-supervision of Fashion Generation

The World Wide Web Conference Pub Date : 2019-05-13 DOI:10.1145/3308558.3313614

Yujie Lin, Pengjie Ren, Zhumin Chen, Z. Ren, Jun Ma, M. de Rijke

{"title":"Improving Outfit Recommendation with Co-supervision of Fashion Generation","authors":"Yujie Lin, Pengjie Ren, Zhumin Chen, Z. Ren, Jun Ma, M. de Rijke","doi":"10.1145/3308558.3313614","DOIUrl":null,"url":null,"abstract":"The task of fashion recommendation includes two main challenges: visual understanding and visual matching. Visual understanding aims to extract effective visual features. Visual matching aims to model a human notion of compatibility to compute a match between fashion items. Most previous studies rely on recommendation loss alone to guide visual understanding and matching. Although the features captured by these methods describe basic characteristics (e.g., color, texture, shape) of the input items, they are not directly related to the visual signals of the output items (to be recommended). This is problematic because the aesthetic characteristics (e.g., style, design), based on which we can directly infer the output items, are lacking. Features are learned under the recommendation loss alone, where the supervision signal is simply whether the given two items are matched or not. To address this problem, we propose a neural co-supervision learning framework, called the FAshion Recommendation Machine (FARM). FARM improves visual understanding by incorporating the supervision of generation loss, which we hypothesize to be able to better encode aesthetic information. FARM enhances visual matching by introducing a novel layer-to-layer matching mechanism to fuse aesthetic information more effectively, and meanwhile avoiding paying too much attention to the generation quality and ignoring the recommendation performance. Extensive experiments on two publicly available datasets show that FARM outperforms state-of-the-art models on outfit recommendation, in terms of AUC and MRR. Detailed analyses of generated and recommended items demonstrate that FARM can encode better features and generate high quality images as references to improve recommendation performance.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"88 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"40","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The World Wide Web Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3308558.3313614","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 40

Abstract

The task of fashion recommendation includes two main challenges: visual understanding and visual matching. Visual understanding aims to extract effective visual features. Visual matching aims to model a human notion of compatibility to compute a match between fashion items. Most previous studies rely on recommendation loss alone to guide visual understanding and matching. Although the features captured by these methods describe basic characteristics (e.g., color, texture, shape) of the input items, they are not directly related to the visual signals of the output items (to be recommended). This is problematic because the aesthetic characteristics (e.g., style, design), based on which we can directly infer the output items, are lacking. Features are learned under the recommendation loss alone, where the supervision signal is simply whether the given two items are matched or not. To address this problem, we propose a neural co-supervision learning framework, called the FAshion Recommendation Machine (FARM). FARM improves visual understanding by incorporating the supervision of generation loss, which we hypothesize to be able to better encode aesthetic information. FARM enhances visual matching by introducing a novel layer-to-layer matching mechanism to fuse aesthetic information more effectively, and meanwhile avoiding paying too much attention to the generation quality and ignoring the recommendation performance. Extensive experiments on two publicly available datasets show that FARM outperforms state-of-the-art models on outfit recommendation, in terms of AUC and MRR. Detailed analyses of generated and recommended items demonstrate that FARM can encode better features and generate high quality images as references to improve recommendation performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在时尚生成的共同监督下改进服装推荐

时尚推荐的任务包括两个主要挑战:视觉理解和视觉匹配。视觉理解的目的是提取有效的视觉特征。视觉匹配旨在模拟人类的兼容性概念，以计算时尚产品之间的匹配。大多数先前的研究仅依赖于推荐损失来指导视觉理解和匹配。虽然这些方法捕获的特征描述了输入项目的基本特征(如颜色、纹理、形状)，但它们与输出项目的视觉信号没有直接关系(有待推荐)。这是有问题的，因为缺乏美学特征(例如，风格，设计)，我们可以根据这些特征直接推断输出项目。特征是单独在推荐损失下学习的，其中监督信号仅仅是给定的两个项目是否匹配。为了解决这个问题，我们提出了一个神经共同监督学习框架，称为时尚推荐机(FARM)。FARM通过结合对生成损失的监督来提高视觉理解，我们假设这能够更好地编码美学信息。FARM通过引入一种新颖的层对层匹配机制来增强视觉匹配，从而更有效地融合审美信息，同时避免过多关注生成质量而忽略推荐性能。在两个公开可用的数据集上进行的大量实验表明，FARM在AUC和MRR方面优于最先进的服装推荐模型。对生成和推荐项目的详细分析表明，FARM可以对更好的特征进行编码，并生成高质量的图像作为参考，从而提高推荐性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

The World Wide Web Conference

自引率

0.00%

发文量

期刊最新文献

Decoupled Smoothing on Graphs Think Outside the Dataset: Finding Fraudulent Reviews using Cross-Dataset Analysis Augmenting Knowledge Tracing by Considering Forgetting Behavior Enhancing Fashion Recommendation with Visual Compatibility Relationship Judging a Book by Its Cover: The Effect of Facial Perception on Centrality in Social Networks