Michael Ogezi, B. Hauer, Talgat Omarov, Ning Shi, Grzegorz Kondrak
{"title":"UAlberta at SemEval-2023 Task 1: Context Augmentation and Translation for Multilingual Visual Word Sense Disambiguation","authors":"Michael Ogezi, B. Hauer, Talgat Omarov, Ning Shi, Grzegorz Kondrak","doi":"10.48550/arXiv.2306.14067","DOIUrl":null,"url":null,"abstract":"We describe the systems of the University of Alberta team for the SemEval-2023 Visual Word Sense Disambiguation (V-WSD) Task. We present a novel algorithm that leverages glosses retrieved from BabelNet, in combination with text and image encoders. Furthermore, we compare language-specific encoders against the application of English encoders to translated texts. As the contexts given in the task datasets are extremely short, we also experiment with augmenting these contexts with descriptions generated by a language model. This yields substantial improvements in accuracy. We describe and evaluate additional V-WSD methods which use image generation and text-conditioned image segmentation. Some of our experimental results exceed those of our official submissions on the test set. Our code is publicly available at https://github.com/UAlberta-NLP/v-wsd.","PeriodicalId":444285,"journal":{"name":"International Workshop on Semantic Evaluation","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Semantic Evaluation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2306.14067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
We describe the systems of the University of Alberta team for the SemEval-2023 Visual Word Sense Disambiguation (V-WSD) Task. We present a novel algorithm that leverages glosses retrieved from BabelNet, in combination with text and image encoders. Furthermore, we compare language-specific encoders against the application of English encoders to translated texts. As the contexts given in the task datasets are extremely short, we also experiment with augmenting these contexts with descriptions generated by a language model. This yields substantial improvements in accuracy. We describe and evaluate additional V-WSD methods which use image generation and text-conditioned image segmentation. Some of our experimental results exceed those of our official submissions on the test set. Our code is publicly available at https://github.com/UAlberta-NLP/v-wsd.