Generalised Zero-shot Learning with Multi-modal Embedding Spaces

2020 Digital Image Computing: Techniques and Applications (DICTA) Pub Date : 2020-11-29 DOI:10.1109/DICTA51227.2020.9363405

Rafael Felix, M. Sasdelli, Ben Harwood, G. Carneiro

{"title":"Generalised Zero-shot Learning with Multi-modal Embedding Spaces","authors":"Rafael Felix, M. Sasdelli, Ben Harwood, G. Carneiro","doi":"10.1109/DICTA51227.2020.9363405","DOIUrl":null,"url":null,"abstract":"Generalised zero-shot learning (GZSL) methods aim to classify previously seen and unseen visual classes by leveraging the semantic information of those classes. In the context of GZSL, semantic information is non-visual data such as a text description of the seen and unseen classes. Previous GZSL methods have explored transformations between visual and semantic spaces, as well as the learning of a latent joint visual and semantic space. In these methods, even though learning has explored a combination of spaces (i.e., visual, semantic or joint latent space), inference tended to focus on using just one of the spaces. By hypothesising that inference must explore all three spaces, we propose a new GZSL method based on a multimodal classification over visual, semantic and joint latent spaces. Another issue affecting current GZSL methods is the intrinsic bias toward the classification of seen classes - a problem that is usually mitigated by a domain classifier which modulates seen and unseen classification. Our proposed approach replaces the modulated classification by a computationally simpler multidomain classification based on averaging the multi-modal calibrated classifiers from the seen and unseen domains. Experiments on GZSL benchmarks show that our proposed GZSL approach achieves competitive results compared with the state-of-the-art.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Digital Image Computing: Techniques and Applications (DICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DICTA51227.2020.9363405","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Generalised zero-shot learning (GZSL) methods aim to classify previously seen and unseen visual classes by leveraging the semantic information of those classes. In the context of GZSL, semantic information is non-visual data such as a text description of the seen and unseen classes. Previous GZSL methods have explored transformations between visual and semantic spaces, as well as the learning of a latent joint visual and semantic space. In these methods, even though learning has explored a combination of spaces (i.e., visual, semantic or joint latent space), inference tended to focus on using just one of the spaces. By hypothesising that inference must explore all three spaces, we propose a new GZSL method based on a multimodal classification over visual, semantic and joint latent spaces. Another issue affecting current GZSL methods is the intrinsic bias toward the classification of seen classes - a problem that is usually mitigated by a domain classifier which modulates seen and unseen classification. Our proposed approach replaces the modulated classification by a computationally simpler multidomain classification based on averaging the multi-modal calibrated classifiers from the seen and unseen domains. Experiments on GZSL benchmarks show that our proposed GZSL approach achieves competitive results compared with the state-of-the-art.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于多模态嵌入空间的广义零学习

广义零次学习(GZSL)方法的目的是利用类的语义信息对以前见过和未见过的视觉类进行分类。在GZSL的上下文中，语义信息是非可视数据，例如已见和未见类的文本描述。以前的GZSL方法已经探索了视觉和语义空间之间的转换，以及潜在的视觉和语义联合空间的学习。在这些方法中，即使学习探索了空间的组合(即视觉、语义或联合潜在空间)，推理往往只关注使用其中一个空间。通过假设推理必须探索所有三个空间，我们提出了一种基于视觉、语义和联合潜在空间的多模态分类的新的GZSL方法。影响当前GZSL方法的另一个问题是对可见类分类的固有偏见——这个问题通常通过调节可见和不可见分类的领域分类器来缓解。我们提出的方法用基于从可见域和不可见域平均多模态校准分类器的计算更简单的多域分类器取代调制分类。在GZSL基准测试上的实验表明，我们提出的GZSL方法与目前最先进的方法相比取得了具有竞争力的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 Digital Image Computing: Techniques and Applications (DICTA)

自引率

0.00%

发文量

期刊最新文献

Pixel-RRT*: A Novel Skeleton Trajectory Search Algorithm for Hepatic Vessels M2-Net: A Multi-scale Multi-level Feature Enhanced Network for Object Detection in Optical Remote Sensing Images Using Environmental Context to Synthesis Missing Pixels Automatic Assessment of Open Street Maps Database Quality using Aerial Imagery Temporal 3D RetinaNet for fish detection