{"title":"Hybrid Joint Embedding with Intra-Modality Loss for Image-Text Matching","authors":"Doaa B. Ebaid, A. El-Zoghabi, Magda M. Madbouly","doi":"10.1109/ISCMI56532.2022.10068471","DOIUrl":null,"url":null,"abstract":"Image-text(caption) matching has become a regular evaluation of joint-embedding models that combine vision and language. This task comprises ranking the data of one modality (images) based on a Text query (Image Retrieval), and ranking texts by relevance for an image query (Text Retrieval). The current joint embedding approaches use symmetric similarity measurement, due to that order embedding is not taken in consideration. In addition to that, in image-text matching, the used losses ignore the intra similarity in a certain modality that explores the relation between the candidates in the same modality. In spite of, the important role of intra information in the embedding. In this paper, we proposed a hybrid joint embedding approach that combines between distance preserving which based on symmetric distance and order preserving that based on asymmetric distance to improve image-text matching. In addition to that we propose an intra loss function to enrich the embedding with intra-modality information. We evaluate our embedding approach on the baseline model on Flickr30K dataset. The proposed loss shows a significant enhancement in matching task.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"37 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCMI56532.2022.10068471","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Image-text(caption) matching has become a regular evaluation of joint-embedding models that combine vision and language. This task comprises ranking the data of one modality (images) based on a Text query (Image Retrieval), and ranking texts by relevance for an image query (Text Retrieval). The current joint embedding approaches use symmetric similarity measurement, due to that order embedding is not taken in consideration. In addition to that, in image-text matching, the used losses ignore the intra similarity in a certain modality that explores the relation between the candidates in the same modality. In spite of, the important role of intra information in the embedding. In this paper, we proposed a hybrid joint embedding approach that combines between distance preserving which based on symmetric distance and order preserving that based on asymmetric distance to improve image-text matching. In addition to that we propose an intra loss function to enrich the embedding with intra-modality information. We evaluate our embedding approach on the baseline model on Flickr30K dataset. The proposed loss shows a significant enhancement in matching task.