Learning Words by Drawing Images

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2019-06-01 DOI:10.1109/CVPR.2019.00213

Dídac Surís, Adrià Recasens, David Bau, David F. Harwath, James R. Glass, A. Torralba

{"title":"Learning Words by Drawing Images","authors":"Dídac Surís, Adrià Recasens, David Bau, David F. Harwath, James R. Glass, A. Torralba","doi":"10.1109/CVPR.2019.00213","DOIUrl":null,"url":null,"abstract":"We propose a framework for learning through drawing. Our goal is to learn the correspondence between spoken words and abstract visual attributes, from a dataset of spoken descriptions of images. Building upon recent findings that GAN representations can be manipulated to edit semantic concepts in the generated output, we propose a new method to use such GAN-generated images to train a model using a triplet loss. To apply the method, we develop Audio CLEVRGAN, a new dataset of audio descriptions of GAN-generated CLEVR images, and we describe a training procedure that creates a curriculum of GAN-generated images that focuses training on image pairs that differ in a specific, informative way. Training is done without additional supervision beyond the spoken captions and the GAN. We find that training that takes advantage of GAN-generated edited examples results in improvements in the model's ability to learn attributes compared to previous results. Our proposed learning framework also results in models that can associate spoken words with some abstract visual concepts such as color and size.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"213 1 1","pages":"2029-2038"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2019.00213","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

We propose a framework for learning through drawing. Our goal is to learn the correspondence between spoken words and abstract visual attributes, from a dataset of spoken descriptions of images. Building upon recent findings that GAN representations can be manipulated to edit semantic concepts in the generated output, we propose a new method to use such GAN-generated images to train a model using a triplet loss. To apply the method, we develop Audio CLEVRGAN, a new dataset of audio descriptions of GAN-generated CLEVR images, and we describe a training procedure that creates a curriculum of GAN-generated images that focuses training on image pairs that differ in a specific, informative way. Training is done without additional supervision beyond the spoken captions and the GAN. We find that training that takes advantage of GAN-generated edited examples results in improvements in the model's ability to learn attributes compared to previous results. Our proposed learning framework also results in models that can associate spoken words with some abstract visual concepts such as color and size.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过画图来学习单词

我们提出了一个通过绘画学习的框架。我们的目标是从图像的口头描述数据集中学习口语单词和抽象视觉属性之间的对应关系。基于最近的发现，GAN表示可以被操纵来编辑生成输出中的语义概念，我们提出了一种使用GAN生成的图像来使用三重损失训练模型的新方法。为了应用该方法，我们开发了Audio CLEVRGAN，这是一个gan生成的CLEVR图像的音频描述的新数据集，我们描述了一个训练过程，该过程创建了gan生成的图像课程，该课程侧重于以特定的、信息丰富的方式对不同的图像对进行训练。训练是在没有额外监督的情况下完成的，除了口语字幕和GAN。我们发现，与之前的结果相比，利用gan生成的编辑示例的训练可以提高模型学习属性的能力。我们提出的学习框架还产生了一些模型，这些模型可以将口语与一些抽象的视觉概念(如颜色和大小)联系起来。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量

期刊最新文献

Multi-Level Context Ultra-Aggregation for Stereo Matching Leveraging Heterogeneous Auxiliary Tasks to Assist Crowd Counting Incremental Object Learning From Contiguous Views Progressive Teacher-Student Learning for Early Action Prediction Inverse Discriminative Networks for Handwritten Signature Verification