{"title":"A proof-of-concept meaning discrimination experiment to compile a word-in-context dataset for adjectives – A graph-based distributional approach","authors":"Enikő Héja, Noémi Ligeti-Nagy","doi":"10.1556/2062.2022.00579","DOIUrl":null,"url":null,"abstract":"The Word-in-Context corpus, which forms part of the SuperGLUE benchmark dataset, focuses on a specific sense disambiguation task: it has to be decided whether two occurrences of a given target word in two different contexts convey the same meaning or not. Unfortunately, the WiC database exhibits a relatively low consistency in terms of inter-annotator agreement, which implies that the meaning discrimination task is not well defined even for humans. The present paper aims at tackling this problem through anchoring semantic information to observable surface data. For doing so, we have experimented with a graph-based distributional approach, where both sparse and dense adjectival vector representations served as input. According to our expectations the algorithm is able to anchor the semantic information to contextual data, and therefore it is able to provide clear and explicit criteria as to when the same meaning should be assigned to the occurrences. Moreover, since this method does not rely on any external knowledge base, it should be suitable for any low- or medium-resourced language.","PeriodicalId":37594,"journal":{"name":"Acta Linguistica Academica","volume":" ","pages":""},"PeriodicalIF":0.5000,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Linguistica Academica","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1556/2062.2022.00579","RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 0
Abstract
The Word-in-Context corpus, which forms part of the SuperGLUE benchmark dataset, focuses on a specific sense disambiguation task: it has to be decided whether two occurrences of a given target word in two different contexts convey the same meaning or not. Unfortunately, the WiC database exhibits a relatively low consistency in terms of inter-annotator agreement, which implies that the meaning discrimination task is not well defined even for humans. The present paper aims at tackling this problem through anchoring semantic information to observable surface data. For doing so, we have experimented with a graph-based distributional approach, where both sparse and dense adjectival vector representations served as input. According to our expectations the algorithm is able to anchor the semantic information to contextual data, and therefore it is able to provide clear and explicit criteria as to when the same meaning should be assigned to the occurrences. Moreover, since this method does not rely on any external knowledge base, it should be suitable for any low- or medium-resourced language.
word -in- context语料库是SuperGLUE基准数据集的一部分,它专注于一个特定的意义消歧义任务:它必须确定给定目标单词在两个不同的上下文中的两次出现是否传达相同的含义。不幸的是,WiC数据库在注释者之间的一致性方面表现出相对较低的一致性,这意味着即使对于人类来说,意义区分任务也没有很好地定义。本文旨在通过将语义信息锚定到可观测表面数据来解决这一问题。为此,我们尝试了一种基于图的分布方法,其中稀疏和密集的形容词向量表示都作为输入。根据我们的期望,该算法能够将语义信息锚定到上下文数据,因此它能够提供清晰明确的标准,以确定何时应该为出现的事件分配相同的含义。此外,由于这种方法不依赖于任何外部知识库,因此它应该适用于任何低资源或中等资源的语言。
期刊介绍:
Acta Linguistica Academica publishes papers on general linguistics. Papers presenting empirical material must have strong theoretical implications. The scope of the journal is not restricted to the core areas of linguistics; it also covers areas such as socio- and psycholinguistics, neurolinguistics, discourse analysis, the philosophy of language, language typology, and formal semantics. The journal also publishes book and dissertation reviews and advertisements.