一个概念验证的意义辨别实验,用于编译形容词的上下文词数据集-基于图的分布方法

IF 0.5 3区 文学 N/A LANGUAGE & LINGUISTICS Acta Linguistica Academica Pub Date : 2022-12-12 DOI:10.1556/2062.2022.00579
Enikő Héja, Noémi Ligeti-Nagy
{"title":"一个概念验证的意义辨别实验,用于编译形容词的上下文词数据集-基于图的分布方法","authors":"Enikő Héja, Noémi Ligeti-Nagy","doi":"10.1556/2062.2022.00579","DOIUrl":null,"url":null,"abstract":"The Word-in-Context corpus, which forms part of the SuperGLUE benchmark dataset, focuses on a specific sense disambiguation task: it has to be decided whether two occurrences of a given target word in two different contexts convey the same meaning or not. Unfortunately, the WiC database exhibits a relatively low consistency in terms of inter-annotator agreement, which implies that the meaning discrimination task is not well defined even for humans. The present paper aims at tackling this problem through anchoring semantic information to observable surface data. For doing so, we have experimented with a graph-based distributional approach, where both sparse and dense adjectival vector representations served as input. According to our expectations the algorithm is able to anchor the semantic information to contextual data, and therefore it is able to provide clear and explicit criteria as to when the same meaning should be assigned to the occurrences. Moreover, since this method does not rely on any external knowledge base, it should be suitable for any low- or medium-resourced language.","PeriodicalId":37594,"journal":{"name":"Acta Linguistica Academica","volume":null,"pages":null},"PeriodicalIF":0.5000,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A proof-of-concept meaning discrimination experiment to compile a word-in-context dataset for adjectives – A graph-based distributional approach\",\"authors\":\"Enikő Héja, Noémi Ligeti-Nagy\",\"doi\":\"10.1556/2062.2022.00579\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Word-in-Context corpus, which forms part of the SuperGLUE benchmark dataset, focuses on a specific sense disambiguation task: it has to be decided whether two occurrences of a given target word in two different contexts convey the same meaning or not. Unfortunately, the WiC database exhibits a relatively low consistency in terms of inter-annotator agreement, which implies that the meaning discrimination task is not well defined even for humans. The present paper aims at tackling this problem through anchoring semantic information to observable surface data. For doing so, we have experimented with a graph-based distributional approach, where both sparse and dense adjectival vector representations served as input. According to our expectations the algorithm is able to anchor the semantic information to contextual data, and therefore it is able to provide clear and explicit criteria as to when the same meaning should be assigned to the occurrences. Moreover, since this method does not rely on any external knowledge base, it should be suitable for any low- or medium-resourced language.\",\"PeriodicalId\":37594,\"journal\":{\"name\":\"Acta Linguistica Academica\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2022-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Acta Linguistica Academica\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://doi.org/10.1556/2062.2022.00579\",\"RegionNum\":3,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"N/A\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Linguistica Academica","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1556/2062.2022.00579","RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"N/A","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 0

摘要

word -in- context语料库是SuperGLUE基准数据集的一部分,它专注于一个特定的意义消歧义任务:它必须确定给定目标单词在两个不同的上下文中的两次出现是否传达相同的含义。不幸的是,WiC数据库在注释者之间的一致性方面表现出相对较低的一致性,这意味着即使对于人类来说,意义区分任务也没有很好地定义。本文旨在通过将语义信息锚定到可观测表面数据来解决这一问题。为此,我们尝试了一种基于图的分布方法,其中稀疏和密集的形容词向量表示都作为输入。根据我们的期望,该算法能够将语义信息锚定到上下文数据,因此它能够提供清晰明确的标准,以确定何时应该为出现的事件分配相同的含义。此外,由于这种方法不依赖于任何外部知识库,因此它应该适用于任何低资源或中等资源的语言。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A proof-of-concept meaning discrimination experiment to compile a word-in-context dataset for adjectives – A graph-based distributional approach
The Word-in-Context corpus, which forms part of the SuperGLUE benchmark dataset, focuses on a specific sense disambiguation task: it has to be decided whether two occurrences of a given target word in two different contexts convey the same meaning or not. Unfortunately, the WiC database exhibits a relatively low consistency in terms of inter-annotator agreement, which implies that the meaning discrimination task is not well defined even for humans. The present paper aims at tackling this problem through anchoring semantic information to observable surface data. For doing so, we have experimented with a graph-based distributional approach, where both sparse and dense adjectival vector representations served as input. According to our expectations the algorithm is able to anchor the semantic information to contextual data, and therefore it is able to provide clear and explicit criteria as to when the same meaning should be assigned to the occurrences. Moreover, since this method does not rely on any external knowledge base, it should be suitable for any low- or medium-resourced language.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Acta Linguistica Academica
Acta Linguistica Academica Arts and Humanities-Literature and Literary Theory
CiteScore
1.00
自引率
20.00%
发文量
20
期刊介绍: Acta Linguistica Academica publishes papers on general linguistics. Papers presenting empirical material must have strong theoretical implications. The scope of the journal is not restricted to the core areas of linguistics; it also covers areas such as socio- and psycholinguistics, neurolinguistics, discourse analysis, the philosophy of language, language typology, and formal semantics. The journal also publishes book and dissertation reviews and advertisements.
期刊最新文献
American linguistics in transition: From post-Bloomfieldian structuralism to generative grammar Production of Mandarin and Fuzhou lexical tones in six- to seven-year-old Mandarin-Fuzhou bilingual children No lowering, only paradigms: A paradigm-based account of linking vowels in Hungarian Strange a construction: The “A egy N” in Hungarian Another fortis-lenis language: A reanalysis of Old English obstruents
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1